Modeling Coral Reef Resilience in the Republic of Palau
Coral reefs across the planet are threatened by the rising seawater temperatures driven by climate change. Although many corals indeed "bleach," and consequently perish, as a result of prolonged exposure to abnormally high temperatures, some species (or even genotypes within a single species) maintain a marked level of climate resilience. Historically, we have identified these "super corals" in post-hoc fashion: searching through the proverbial rubble of a highly impacted reef to find the survivors. For coral reef restoration and other purposes, a more targeted, proactive means of identifying climate-resilient corals would be preferred to this "needle in a haystack" approach. To this end, I showcase a rich coral eco-physiological data set acquired during a month-long research expedition to the most remote corners of the Micronesian nation of Palau. After some rudimentary data processing and visualization, I show, using JMP Pro 17, how predictive models of coral resilience can be built relatively easily. I then demonstrate how GUIs derived from the models' prediction profilers can be embedded on web pages so that they can be used by scientists as a planning tool. Specifically, the model-based profilers allow researchers to predict the environmental conditions (e.g., depth, type of coral reef, salinity) at which they are most likely to find resilient corals during their bioprospecting surveys. This analytical tool will therefore aid marine biologists in locating corals with high climate tolerance that should be propagated in efforts to restore degraded reefs.
Hey, everybody. Thanks for tuning in. My name's Anderson Mayfield, and I'm a Coral Reef Scientist and long term user and advocate of JMP. Over the next 30–35 minutes, I'm going to talk to you about modeling coral reef resilience in the Republic of Palau. Unfortunately, the underlying motivation for pretty much all the research I've been doing over the last 20 years or so on coral reefs is not only the fact that they're beautiful and just amazing places to behold, but they're highly threatened, particularly by rising seawater temperatures associated with climate change.
What happens, ocean gets too hot for these corals. They have a delicate symbiosis with dinoflagellate that live within their cells, and when the water is too hot for too long, this causes the corals to bleach, and they essentially starve to death and perish. Unfortunately, even as we speak, the 4th global coral bleaching event is well underway.
Core reefs are actually not very well studied. They're less than a percent of the entire ocean area, despite being very high biodiversity regions. So we don't actually even know where some of the richest coral areas are. We can guess based on satellite imagery, but we really need to stick our heads underwater, either throw on a mask and snorkel or some scuba gear preferably, and see what's down there. But we've run out of time to comprehensively and holistically survey all the planet's coral reefs. We need to expedite how we find areas that are particularly coral rich.
Similarly, even though I mentioned most corals are very sensitive to high temperatures, there's others that, for whatever reason, maintain resilience when the seawater quality deteriorates. We need to find these coral, so we can understand what are they doing differently? We can propagate them to try to restore reefs, but our usual approach is what I call the needle in a haystack approach. We just go underwater, we sift through the rubble, and we just hope we find some of these, by luck.
We need to have a more targeted data driven way of identifying climate resilient corals. These are the kinds of questions I've been trying to address using JMP. I call this, using predictive modeling to address key knowledge gaps in JMP Pro. I've got a couple of questions I'm going to throw up here. Questions 3 and 4 are super important and really near and dear to my heart, but we're not going to have time to go through those today. They're going to be in the slide deck and in the data I provide on my little web page, so you're also welcome to email me if you want, more details on them, but I've got scripts embedded in the data tables and a step-by-step guide about how you would go about answering them.
Instead, I'm going to focus today's talk on questions 1 and 2. Can we use pre-existing data to predict where there are other reefs with really high percentages of living coral? These are the reefs that are going to be more valuable in terms of serving as a habitat for important fish species, in terms of protecting our shores from wave energy. They're not necessarily going to be the strongest, but they're maybe the most valuable.
Question number 2, on the other hand, is getting at this idea of where are we going to find these 'super corals'. The ones that, for whatever reason, seem to maintain resilience in the face of climate change and other stressors. So the reason I want to focus my talk today on this Palau data set is because it was a really amazing opportunity, with the Living Oceans Foundation. We had a big team of scientists capturing what I call molecules to satellites data set.
That's my pretentious way of saying, we collected data from every scale imaginable. We have satellite data from outer space. We have the ship, making transects, filming, the benthos characterizing the seabed. We have scuba divers like myself, seeing what's living on the reef. Then my main task was actually sampling the coral to try to get an idea of their health and resilience. It's a really rich data set that I think will be useful for answering some of these questions that I've pitched.
I used the term bioprospecting because we're trying to leverage the capacity of these data to make predictions about reefs that we maybe didn't study. Because we were there for a month, we got this amazing data set, but still, we really only scraped the surface about what's there, using predictive modeling to fill in the gaps.
The first half of the talk or probably the next big chunk, we'll say, is trying to identify areas of high coral cover. This is our way of saying areas with a large abundance of the reef building corals, the ones that actually make the reef. We want to try to identify high coral cover reefs that maybe we haven't seen yet. The types of environmental predictors we're using are things like seawater quality, the spatial locations, and other habitat parameters.
I do want to mention in case you're wondering, why do you need to go through this exercise? Can't you just look at a satellite imagery or go on Google Maps and estimate where the corals will be? You would actually be right. That gets you in the right ballpark of where the reefs are located, but it's not telling you which ones are actually alive. A living reef and a dead reef will actually look very similar under a satellite image, so we actually need to rely on diver data to verify what's alive and what's not.
Most of you probably won't be familiar with Palau, but it's a really beautiful small island nation in Micronesia. Pretty much the only reason westerners at least know about it is because it's a diving paradise, and it certainly still is. I don't want to paint too grim a picture and say, everything's reduced to rubble. But on the other hand, these reefs are certainly not pristine. They're facing the same challenges reefs are around the globe, particularly with high temperatures. With that kind of preamble, let's finally open up JMP.
I'm running JMP Pro 18. The first thing I want to do, and this is something JMP really excels at, is really just exploring my data. You'll notice in this stripped down data table, really only got five columns. I've got the sites. We visited 86 reef sites. Got latitude and longitude. I've got some representative photos that I'll show you in a minute. I've got the coral cover, which is what we're going to ultimately be trying to predict later.
Let's just pull up Palau out on a map to give you some context. I've opened Graph Builder here, and you'll notice I don't even need to position latitude and longitude, on the correct axis. JMP intuitively knows where to put them. Turn off this line. But you're not seeing Palau out yet because I have yet to have added a background map. Because I'm in Saudi Arabia now, I've been having some issues syncing up with some of these servers, but I think using this just simple street map server will suffice for now.
These black dots are showing us our dive sites. You'll notice there's some bias here. We really only were able to traverse the western side of the country because of the prevailing wind conditions, so that's certainly a caveat to keep in the back of your head. What I want to do now is I want to actually overlay the coral cover data onto the map. I'm ready to close this window here, still not where I want to be yet though.
For one, I had my Legend turned off. Let's turn the Legend back on. I actually want to change the color scheme. I want to have it go to actually not where red is high, but I want to have green be high, because I want to highlight this idea that where high coral cover reef is desirable.
You'll notice that I made a mistake here. What I actually need to do is I'm not really interested in having the latitude plotted, so what I'm going to do, I'm going to drag this heat map over, and then for the points, which were the latitude, I'm going to remove them. I'm going to go back to my coral cover here and change it back to having green as high. Getting closer, but you've now pretty much lost the island because the pixels are too large. This is a trick I stumbled upon a few years ago.
You can actually, if you're using a heat map to overlay data onto a map, you can force a grid by actually changing the increments to the sizes you want and then turning on the grid lines. Now we're cooking. We're getting something that's looking more like what I'd want to have, and maybe a publication. But I'm seeing too many lines still, so I'm actually going to turn the grid lines back off. Center this a little better. This is looking pretty good. I could tinker around with the axes more, clean them up.
But, really, the point of this right now is to do some exploration. What I think you'll notice here is that we see a little bit more red and orange up here in the far north. These are areas with lower coral abundance. Got an area over here where we're seeing fewer corals. And we've got quite a bit of variety or variability, I should say, here in the south.
Where you're seeing the most coral rich reefs are in this west by southwest region. I want you to keep that in the back of your mind. I also want to show you, okay, so I know the idea, I keep saying coral cover, coral cover, coral cover. This maybe an abstract concept to those of you who are not marine biologists. What I've done here is for a couple of these sites, I've actually uploaded what I consider to be a representative photo. You can see here, this is a reef with about 46% coral cover. That's about the mean for the nation as we'll see here in a minute. This is maybe not the best image to highlight this because this looks absolutely blanketed with corals to the untrained eye, but some of these things are actually not building reefs. They're not contributing to the construction of this framework. It's a little bit deceiving. It's not to say that this is not a beautiful site, but this is not as coral rich as it gets as we'll see.
Let's close that for now. Now let's go over to the more comprehensive data set, so it's still 86 rows, still have the latitude and longitude and coral cover, but here we'll see additional environmental parameters that I think will influence the coral cover, and we'll get over to this side of it later. These are also things like seawater quality.
If my question is I want to be able to predict where are the areas of high coral cover that we haven't yet seen. One way besides just plotting on a map that you might want to explore this is by this handy feature called the predictor screen. What I'm going to do here, I'm going to take my coral cover as my Y, and then I'm going to choose a subset of these environmental factors that I anticipate might influence coral cover as my X. You get this very intuitive, easy to interpret plot. The predictor that contributed the most to the variation in coral cover was latitude, and this makes sense. We were seeing those lower coral cover levels in the north, higher in the south. Sorry, lower in the far south, but maybe a sweet spot in the middle, and this is corroborated by this predictor screen.
I should mention for years, I misinterpreted this. I took this to mean this is the proportion of the variation, explained by latitude. This is actually not entirely true. Instead, this is the proportion of times latitude was included in a boosted tree model, I believe. It may not mean that 33% of the variation is explained by latitude, but what you can say is that latitude, for instance, has about double the influence, as temperature. Whereas these parameters, which I actually would have hypothesized ahead of time, would have been significant drivers of variation in coral cover tend to not have large influence. Actually to highlight that, this is something I almost didn't do because it's essentially negative data, but it's such a handy feature.
If I wanted to kinda get a graphical output essentially of what we just saw, aside from looking at the map, another thing you can do is I can stack the data for the things that are living on the sea floor. Basically, you can bin everything the divers see into these four categories: hard coral, algae, other invertebrates, so other thing other things like sponges, and then the barren substrate. It's not truly barren, but this is the part of the substrate that's not alive, so sand, rocks, things of that nature. These four will sum to a 100%.
What I'll do is I'll stack them, and the reason I want to stack them is because this is going to allow me to make what I consider to be a really compelling graph. What I'll do is now that I've got all these benthic data stacked into the same column, I can drag them over as my Y. I'm going to have this overlay. I'm going to turn on the legend, and I'm going to convert it to a bar chart. We're still not quite there yet because I want to highlight the fact that these four entities sum to a 100%. So I just go over and turn this into a stacked bar chart. They sum to a 100, and this emphasizes something that I didn't mention earlier.
I did mention that 45% is the mean coral cover for the sites we survey. This is actually very high by international standards, so that's a good thing for Palau. They've still got a lot of living coral. Bad news is 40% algae cover is also really high. If you know your basic marine ecology on a reef, you've got this struggle between corals and algae. They're they're both fighting for space. You want to have higher coral at the expense of algae on a reef. It's not to say that algae are always a bad thing in the ocean, but on coral reefs, they're competing with corals, so you've got to take the bad with the good in this case. That's not what I wanted to show you. What I wanted to show you is just how easy I can look at how this assemblage differs across different environmental factors that I might be interested in. This took me a minute, and this is something I would have no problems with probably just pasting into a Word document and submitting with my paper.
For instance, here, we've got three different types of coral reef. We can see how coral cover is pretty much the same, 45% for all these reef types, and that syncs up with what we saw on the predictor screen. Reef type was not actually a very useful predictor, nor was exposure. This is pretty much the wave energy that's hitting the reefs. You see it affects other things like there's way less algae on the protected reefs. This is actually the opposite of many parts of the world. Suffice to say to an ecologist, this figure is very rich and very informative, and it took me 30 seconds to make it, so that's a little bit of an aside. I think it's a really cool feature. Stacking data and using it to make these, what I call, benthic assemblage plots.
Remember, we're doing all this because we want to build a model for coral cover. After I've explored the data a little bit, I've got a handle on drivers of variation. I'm going to gravitate over to the predictive modeling part of JMP Pro, and use this feature that came out a few versions ago that I really like, it's called the model screen. What this is going to do, it's going to allow me to test a large number of different modeling types, simultaneously.
What I'm trying to predict is coral cover. I've got a validation column that's effectively split my 86 sites into 75% training data, 25% validation data. I'm going to turn on some of these additional options. Some of these two-way interactions in quadratics may not make sense, but let's leave them on for now. I've got these predictors of things that we've explored earlier, that I think might influence coral cover. Some of them, we clearly probably could have removed already as not being very important, but, don't worry about that for now.
Let's look at our validation R-Square, so how well the model did at predicting the coral cover of our validation data set. One thing you'll notice is it didn't really do that well. We're seeing R-Squared's of around 0.3, and these tend to be more of the complex machine learning modeling types. Model screening is not necessarily for giving you your model that's going to be in your paper, for instance, it's getting you in the ballpark of where you want to go explore. I know that for some of these really complex ecological data sets, neural nets tend to do really well.
Beforehand, I identified, a neural net, our neural network featuring only seven environmental factors, that had a validation R-Squared of about 0.72, which if you're trying to predict things in the ocean, I actually think is a pretty good number. This will take a couple of seconds to run, so I want to jump back over into PowerPoint and show you this. This model is uncovered because of a really clever add in, developed by a really keen savvy JMP user, Diedrich Schmidt, who basically, if you know anything about neural networks, there's tons of different hyperparameters that you can tinker with, that could actually have pretty large impacts on the model performance, and he has a really nice, generalized auto tuning add in for being able to test large numbers of various combinations of these hyper parameters. These are things like number of hidden layers, the degree of boosting when you've got a single layer, the types of activation nodes, things of that nature.
Let's see. Let's explore this neural network a little bit. One thing you may notice is maybe a little bit concerning is this training R-Squared is quite a bit lower than the validation one. You might be impressed by the validation one and say, "Hey, 0.88 is amazing for an ecological data set." This could definitely be evidence for overfitting. But for now, let's just chalk that up to maybe this run didn't work quite as well as others, and let's just pretend like we're happy with this model, because what I really want to showcase is a couple of things.
First off, prediction profile, I'm going to turn off this assess variable importance because I want to redo it. This is essentially ordering the environmental factors in this case, relative to their influence in the model. The output is actually going to look quite a bit similar to the predictor screen. What will happen, in this case, since I've run this model before, it was already ordered as such. But when you run this variable importance analysis, it will actually reorder your predictors in the prediction profiler with respect, or with respect to their total effect size. You could see in this case, longitude actually had a greater impact than latitude, so the order you see here from top to bottom matches the order here.
The things we found, surprisingly, exposure, even though we didn't see much of an influence of exposure, in the plot earlier, it's still ranked above some of these others, so that's one cool thing. We see this makes sense. The spatial coordinates make sense based on what we saw from the map. We didn't look too much at reef emergence, but I plotted it briefly. You could see very little impact on coral cover, and this is showing us, a value that's roughly the mean for the whole country a little bit lower.
But what I want to do is, say I was happy with this model and I didn't have any signs of overfitting, and it had passed my various criteria. What I want to do, and this is what I think is super cool, is do this desirability analysis. It remembers this because I've done this before, but I want to tell JMP to maximize coral cover. Give me the conditions in which, based on this model, at least, you would be expected to have maximum coral cover. Then I'm going to maximize desirability, and that value jumped up to 73%.
Let me go back to the slide. I have this distribution plot. I've done the coral cover across the whole country. We have this mean of 45 that we mentioned earlier, but the range is actually very wide. It's from 2-78 %. You might wonder why the model didn't just recommend that we go back to that reef that had 78%. The reason is because we actually have removed site and, in this case, island from the model, because we don't want to bias it with… We want to have it be more generalizable for the country.
Otherwise, if you just add in site, island and things like this, it could very well tell you to go back to where you found the highest coral cover. That's going to be the highest probability of being correct, but we're trying to make a more generalizable model here to give us an insight about places we haven't caught yet. 73% is an astonishingly high coral cover, but not completely out of the realms of possibility. I mention this because when you do these kind of desirability analyzes with environmental data, extrapolation is definitely a concern.
In this case, I've looked at this ahead of time. None of these conditions are ridiculous to me. These are all conditions a marine biologist would say, "Hey, look, I can totally envision an exposed fore reef outside the lagoon that's emergent with this mean site depth." That's certainly possible for a coral reef. I've had other models I've done, where if you don't turn on the extrapolation control, it will give you conditions that haven't been experienced on the planet since before the industrial revolution. When you're trying to make predictions, especially in a dynamic system, it behooves you to use these extrapolation controls. I didn't get a warning when I turned the warning on, but I could. What will happen, if you do have issues with extrapolation, you can rerun the model.
In this case, it barely changed it because there's no issues, but what you will notice here, is some of these lines became fixed. If you know anything about coral reefs, you know that a fore reef is always going to be on the outside of the lagoon. It's not even going to let me drag this over because it's saying, "Hey, we're not going to let you simulate a condition that doesn't exist in your dataset." Or in this case, it doesn't exist in nature. This is actually something that's really cool.
What I would do, again, if I was happy with this model is… you know what I'm going to do. I'm going to, not cheat, but I'm going to run it again, and the reason why it will become apparent. Because what I want to do is I want to actually put this model up on my website. I know if you do the desirability analysis, it basically will fix those conditions. That might be desirable, but in my case, I want to have it to be where other users can enter their data, or enter their data, that maybe reefs they want to check out, and then it will give them an output of what the coral cover might be.
Another thing I meant to do. Let's see. No. I took the map down. I was actually going to use those coordinates and see where exactly those latitudinal and longitudinal coordinates would fall within Palau, but I think we'll save that for another time. But I do recall when I've done this simulation earlier, it was somewhere in that southwestern region of the country, where we were tended to see higher coral cover before.
You could see this is actually fairly similar, this rerun of this model. But what I want to do is just pretend we're happy with this model. Actually, want to publish it to JMP Public, and hopefully this won't take too long because I'm already logged in. I'm going to put it in this JMP Discovery folder. I could add additional information here. If I wanted to, I could append a picture.
Let's see what this looks like here. Hopefully, it's interactive. Sometimes when I run it and it still remembers that I'm interested in maxmising coral cover, it will keep the desirability analysis turned on, which is actually great most of the time that it remembers it, but sometimes it can be problematic if you're wanting to post the interactive HTML. It will actually take a few moments to generate the report, so we'll go back to it in a second, but I'll show you because I've done this ahead of time, what it will look like.
This is my website. I've got a Palau section, I got a unique page that I'm calling Palau Bioprospecting, where I'm basically… I'm not just doing this as a proof of concept for JMP. These are actually findings that are I'm very interested in. You can see it's not the exact same model, but it's a similar model as the one I just ran. Say, I've got collaborators and colleagues that work in Palau say, "Hey, we found this reef on the map. It's a submerged reef inside the lagoon. It's a back reef or let's say it's a lagoonal patch reef. It's shallow, and it would be protected in that case. We found it's about here, and location is about here."
That's maybe not a great example because that's showing us about the mean for the whole country, so the model's guess would be, "Hey, go check that place out." It would be as good of a reef as any, and you could optimize the model, help train it better by putting in the amount you actually saw.
What I've done here is I actually put in a static view of the optimized desirability. You could see it's a little bit different because it varies from run to run. But in this case, these are the conditions that are projected based on my model, that would lead to a reef of 67% coral cover, which again is really high. So how to do that?
This is what it looks like. Again, the model's slightly different, but it did post correctly. It's being interactive. All I have to do… Let's see if this is in the way. I could do two things. I could either take the embed code of the iframe or the card. I prefer iframe because the card, what it does, it basically just puts this little stub here, which is essentially a glorified hyperlink that will take you to the JMP Public page, which might be all you need. But if you actually want to have this embedded window where you can have users go in there and put in their own data, then you want to use the iframe.
I'm using Squarespace, but literally, it's as easy as just you can either do code or embed, put the code here. I've got it pasted off the JMP Public website. It's going to take a few seconds to load and then, boom, save this, do a refresh and then send this link to my colleagues. If they're JMP users, great, they could get way more detail. If not, they can go ahead and start testing hypotheses, and trying to get estimates for things like coral cover, using this feature.
This is something that has probably been around and impossible for JMP for years, but it really floored me that you could have an interactive prediction profiler up on a website that easily. There's just huge potential for just all these questions that I've been wanting to answer.
In the interest of time, I'm not going to go through the super corals example because you could see it takes a couple of minutes to get there. But just as a couple of spoilers it's essentially the very same kind of process. If you're going through your predictive modeling, instead of looking at coral cover, you're using this coral health index that I generated a couple of years ago. You can see how much goes into it. This is very high resolution, but it also is very expensive. There's no way you're going to be able to measure all these things in millions and millions of corals.
Again, you're needing to use predictive modeling so that you can ideally have cheaper proxies that will tell you, rather than spend thousands of dollars measuring these fancy molecular things, we're relying on these predictive models to say, "Hey, if you want a coral that has a 99% chance in this example of being a 'super coral', it's going to be this size on a reef with very high coral cover, with low coral diversity, etcetera, etcetera." It's a very similar analytical process in JMP Pro to get at a different question.
Where are these super corals found? Then, again, I could put it up on my website, and this would help people plan. We have projects now, the institute I'm working with, Cord App, is working with people in Palau who are interested in finding these super corals because they want to propagate them in their coral nurseries. These prediction profilers that are going to come out of these JMP Pro driven models are going to be super useful in helping them hone in on where these reefs might be.
If you're interested in this topic, I encourage you to go through the presentation. I've skipped over a bunch of slides. I've got all the scripts in the JMP data table, so you can do this analysis yourself. I think there's some really powerful things we can do now with the kinds of data, the types of data that are coming in from coral reefs. Doing things like optimizing marine conservation planning.
I think, I've really benefited from working with JMP and and becoming familiar with the software. It's really helped me to answer some of these big questions that are getting at the heart of things like climate change and the fate of the planet. I think you definitely have this world as your oyster feel as you become more familiar with JMP, and particularly JMP Pro, if you're doing a lot of predictive model building. I definitely encourage people to to get out there and explore and really try to tackle some of these big questions, because you'll be surprised how much you can do.
With that, I want to wrap this talk up, but thank you for your attention, and certainly feel free to email me if you've got questions about anything I covered. Thanks.