It's Otterly Confusing! Short-Clawed, Hairy-Nosed, Smooth-Coated or Eurasian? ...

Hi, everyone.

Thank you for joining us.

The title of our talk today

is It's Otterly confusing! Short-C lawed, Hairy-N osed, Smooth-C oated, or Eurasian?

Just ask JMP!

These four species occur in same habitat in Asia, in Eurasia,

and the question is, how do you tell them apart?

Using our FIT technique, the footprint identification technique

developed by WildTrack,

we can actually tell them apart.

The presentation will be by four people:

Fred Kistner, who is at the Karlsruhe Institute of Technology in Germany

and also a member of the WildTrack Specialist Group;

Larissa Slaney, who is a PhD candidate at Heriot- Watt University,

FIT Cheetahs Research Project

and also member of the WildTrack Specialist Group.

Zoe Jewell and myself, I'm Sky Alibhai, faculty at Duke University

and founders of WildTrack and developers of the footprint identification technique

and also members of the WildTrack Specialist Group.

I'm going to do a very brief demo, a very, very brief introduction,

and then I'll hand over to our next speaker.

Apart from ultra species that we work on,

WildTrack has an extensive number of eminent species that we work on

in different parts of the world, ranging from the Amur Tiger in China

to black rhino in Africa to jaguar in Brazil.

All of them utilizing in one form or another

are footprint identification technology.

Now what does footprint identification technology actually do?

One of the things about the footprint identification technology, FIT,

is that it works as an add-in in JMP.

It's designed to classify species or subspecies

by using the metrics from the footprints,

classify sex, classify age class, and even classify individuals.

Now those are all elements that are required to understand

the population dynamics of any endangered species,

the essential foundation elements.

The conservation applications of footprint identification technique,

the baseline data on numbers and distribution in form on:

data- driven scientific conservation strategies,

trade in endangered species,

human/ animal conflict mitigation,

and all these in a way will be shown

to represent the way in which otter conservation works.

Now I'll hand you over to Larissa Slaney, who will start the process of deconfusion.

Thank you.

Right. Thank you very much, Sky, for this great introduction.

Thank you to JMP, for allowing us to present our research here.

We're very pleased about that.

Thank you so much for all of you to be here

and show this interest in our research.

Now before we're going to jump, pun intended,

into explaining our data analysis with JMP,

I would like to give you some background to this research.

We think it is really important to look at this in context

because it's not just about what JMP can do,

but how this is applied in the real world and how we scientists can use it to...

It gives us the opportunity to make changes for the better, basically.

Our research looks at footprint analysis as Sky already just said.

We are looking at the footprints of Asian otter species.

Now here you can see four different Asian otter species

and they're all classified as vulnerable or critically- endangered by the IUCN,

and the home ranges overlapped,

so that's so- called sympatric otter species.

On the top left, you can see the smooth-c oated otter.

Top right, you can see the Asian short or small- clawed otter,

bottom left, the hairy- nosed otter,

and at the bottom right, you can see the Eurasian otter.

Now, to be able to monitor the different species,

we need to collect data, which is difficult

with such very elusive species.

You hardly ever see them in the wild, but you do see their footprints.

Looking at these footprints for each species,

when you look at them here, they look very similar, don't they?

Now, it's quite tricky to tell them apart.

Therefore, we set us the task to find out whether

there is a way for FIT to distinguish between the footprints

of these four otter species in a scientific and reliable way.

Now, we do have an added problem here, because the front footprints

of two of the species have a size overlap

and the hind footprint of another species

is morphologically very similar,

and also size- wise quite similar to the front foot of another species.

We've got a multi-class classification problem here.

Now here you can see a map which shows the distribution ranges

of the different otter species.

The blue area here, that's a Eurasian otter.

Then the red area is the smooth- coated otter.

The yellow down here is the small- clawed o tter,

and then the pink over here is the hairy-nosed otter.

But you can see it, although it looks like a large area,

there's actually just a few dotted islands there.

What is really interesting about this map, though,

it shows you where their home ranges overlap.

There are six areas where

at least two, if not even three of the species overlap.

For conservationists, it's really important to find out

where the different species live, to what extent ,

how large the populations are, and find out as much as possible about

the different populations

so that we have a good idea of how endangered they are.

Now, why is otter conservation important?

Well, first of all, otters are classed as Keystone species,

and that means that they have an effect on their environment

disproportionate to their abundance.

Just a few individuals can have quite a big impact.

They play a really important part

in the food chain and contribute to the environmental equilibrium.

They're also seen as Umbrella species,

which means that they confer protection to a large number of other species.

Basically, if something happens to the otters,

that will have an impact on other species as well.

They're also an Indicator species,

so they actually indicate the health of their environment.

They will not live in polluted waters or in polluted wetlands.

Otter returning to an area is always a good sign,

because that means water quality and wetland health is improving.

Now, threats to otters. There are lots of different threats.

Pollution is the one we've already mentioned just now.

But another problem is the human wildlife conflict, habitat loss.

With that also comes loss of prey.

A n increasing problem, especially in Asia,

is the illegal trade, the illegal wildlife trade.

They are particularly after the fur, the fur trade and also after pets.

Baby otters are taken out of the wild and used as pets,

which is not good for otter conservation at all.

Now

how do you approach a conservation project like this?

Well, first of all, you need to think

about how do you want to monitor the population.

What do you want to look at?

Do you want to look at species distribution?

Yes, almost always.

Do you want to look at individual ID?

Do you want to find out what the sex ratio within a population is?

The next thing you need to decide is do you want to use invasive methods

that potentially stress or even harm the animals?

Or do you want to use non- invasive technologies or methods

to monitor the species which will not stress and harm the animals?

We, at WildTrack,

we focus on non-invasive ways to monitor species.

In this particular project,

we are completely focusing on footprint identification.

Now, once you have made those decisions,

you need to obviously train people to help you with the data collection

because you can't be everywhere and you can't go everywhere.

During times of pandemics, it's even more difficult.

So you need to train your team both in- person as well as remotely

so that has been a bit of a challenge.

Then you need to get all the data collection,

and the training and the data collection can happen in-situ,

which means in the field or ex- situ, which means in zoos

and other conservation organizations.

Once you get the data in, that is when you start the data analysis.

In our case, that's when we start using JMP.

Other typical issues for any conservation project is funding, of course, and also

trying to get conservation policies improved,

and management strategies for conservation to have those improved.

That's basically our end goal. We are collecting all the data.

We are analyzing all the data so that at the end of the day,

we can give that information to governments

or other organizations and they can make an informed decision

and make better conservation policies.

Let me just go back one more time.

On the left hand side here, actually,

you can see one of our lovely zookeepers collecting footprints for us.

On the right hand side, you can see

a footprint image that was sent to us from the wild.

That's a mystery footprint.

We were asked if we could please find out

which species left that footprint behind.

That's really what we want,

that researchers start to send us footprints

and we can help them find out which species lives in their area.

Fred will, hopefully later on, help us reveal

which species this footprint belongs to.

Now, we've asked ourselves three research questions,

and at the moment we are still focusing on one.

This is an ongoing project.

A t the moment, we are focusing on species classification.

Can FIT, the footprint identification technology,

identify or distinguish

between the four different species of otter we are looking at?

When we've got enough data and enough particular data,

where we definitely know the individuals,

we will look at individual classification and also at sex classification.

But that's going to be a bit further down the road.

So far we have teamed up with nine zoos and otter conservation organizations.

We've been training them to collect footprints following our FIT protocol.

This has been, again, during COVID, quite challenging.

I've not been able to see everybody in person,

so some people I've had to train remotely,

but they've all been absolutely fantastic, our zoos and zookeepers,

and have really risen to this challenge

and have started to really send in a lot of images

as you can see here on the left.

It's still overall much smaller sample size than we want to have.

As I said it's an ongoing project, but it

is enough to give us

the ability to now to share some preliminary results with you

so we can draw some conclusions.

We've included three otter species in this so far.

We've only just started to begin to get h airy- nosed otter prints.

There's only one h airy-nosed otter incaptivity in the whole wide world.

His or her, I'm not sure, prints are just starting to come in,

and we will update at a later time

the results with this fourth species in it.

But for now, we're going to look at three otter species.

Yes, so I think it's time to have a closer look at how we do the data analysis

over to Fred.

Thank you, Larissa, and let's jump straight into action.

Like Sky mentioned previously,

FIT has been developed for a wide number of species.

When it's fully developed

or after leaving production, it's an add-in into JMP.

Today, I'm going to demonstrate some parts of the data analysis

and some parts of the development before it comes into production.

What I am going to say is, in general, I just wanted to give you

a little bit of a background of how this development is usually done.

Our input data is collected

with very little equipment and very simple equipment.

That's one of the main advantages of FIT,

that it can be widely applied with very little equipment.

You only need a smartphone and a ruler.

If you want to develop FIT models for certain species,

you start with an image database that is usually collected

of known individuals as Larissa mentioned.

We therefore cooperate with zoos and other wildlife centers.

These images are then processed within JMP to extract geometric profiles

that extract a lot of measurements, angles, distances.

This data can then be used to develop FIT models.

The general output is that you want to look at species, sex, individuals.

If you're able to edit for individuals, you want to draw conclusions

about population size.

Once you develop the method,

you definitely want to test this on unknown individuals.

Again, you look at images and get a prediction of the models.

Advantages of FIT based on biometric, it's non-invasive.

It's a standardized and cost effective way to monitor elusive wildlife

that cannot be monitored by direct observation.

It can be implemented for almost any species that leaves a footprint.

It can be combined with other non- invasive methods

and cross- validated models generally have a high accuracy.

How to build these models is something that I would like to demo.

What I'm going to demo today is technically looking

at different footprints.

You see on the top left, you see a hind foot

of an Asian small-clawed otter.

On the top right, you see a left front of a smooth-coated otter,

on the bottom left, a left front of a Eurasian otter,

and on the right hand side, you see a right foot footprint

from an unknown otter from Nepal.

What we are going to do today is we process these images.

Then I'm going to show you how to quickly develop

a classification model within JMP.

Then I'll see what the predictions

of these quickly develop methods are going to be.

It all starts

with image analysis.

That's script-based implementation within FIT.

In the first step, you usually adjust the size of an image

so that the footprint is clearly visible and the dominating part of the field.

In order to be replicable,

it's important that footprints are aligned following defined rotation points.

For otters, these are rotation points below the second and the fourth toe.

Then you set a defined set of landmark points.

Again, for otters, this is species- specific.

But for others, I've chosen 11 landmarks.

They're in the center.

Sorry, forgot one step.

Of course, you need to define a scale first.

Here we got 10 centimeters. This is up here.

You can add some additional information.

Just to keep it simple, I will name th is strike Asian short-clawed otter.

Then you set 11 landmark points.

You could, for instance, use a cost air function

if you want to make this as precise as possible, obviously,

but for time reason, I'll just quickly run through them.

After setting 11 landmarks, you derive additional points,

which are helping points that are also used

to extract biometric information.

Once you've done that,

you'll just start a new table and you go for a pen draw.

I'll just quickly run through three more images.

Again, you need to resize them.

Now, with this image, you can see it's upside down.

What I like about JMP is that

the image window can actually do some image pre- processing.

Now, it's right front. Sorry, I need to flip this one more time.

Can do some image processing within JMP and so you don't have to change

in between software.

That's something that I really like that I can do all my work

within one software other than switching in between several software.

Again, I said 11 landmark points.

This time, I'll just go over them quick and dirty,

and hope that the prediction will be accurate enough

that's a Eurasian otter.

Again, I go append

just two more times, one for the smooth-coated otter.

Again, I will set the 11 landmarks,

and what the landmarks are used for, I'll show you in a second.

Derive points, append row.

One last time, the mystery footprint that Larissa mentioned

that was sent to us from a project in Nepal,

that is, to my knowledge, doing some otter monitoring there.

One of the species has not been seen for at least 30 years.

1, 2, 3, 4, 5, 6 .

What's different in here is that you have a different scale.

That scale factor is something that I need to adjust within here.

Again, I'll quickly click through the images.

This is normally done a little bit more tedious,

but for demo sake, I'll try to click through them quickly.

And this is an unknown

species.

What you end up with this was the smooth-coated.

What you end up is

a big data table .

These are points for

evaluating the quality of the landmark, which I did not go into within here.

You get X and Y coordinates for each landmark.

These X and Y coordinates are derived

to calculate a large number of measurements.

There's more than 100 distances derived, some angles and some areas.

There's quite a lot of information extracted out of a single footprint.

If you repeat this step that I've just shown several times,

you'll end up with a data table like this.

This is the data table that I'm going to demo the prediction model on.

If you have a look what we have here,

if you look at the distributions or species or target variable,

you see that I have 405 processed images of Eurasian otters,

278 Asian short-clawed otters, and 127 smooth-coated otters.

It's not perfectly equally- distributed groups,

but at least each group has quite a significant sample size,

which will hopefully work for modeling.

Whenever you want to do any sort of supervised modeling,

it's a good idea to split your data into training and test data.

This can be very easily done in JMP.

You have to make validation column within the predictive modeling platform.

What I've done is I randomly split my data into 80 percent training data

and 20 percent test data,

where I will test the models that we're going to build on

and see how they perform.

All right, so I've previously done this.

What I'll do now is I'll just select my training data,

which are 648 rows,

and I will just have a look into a data view.

This is 648 observations.

I'll quickly save this as my training set.

Again, if you have a look at the distribution,

you could see that we have,

100 smooth-coated otter prints, 324 Eurasian otter prints, and 223 ASC.

It's the same distribution percentages as with the previous data set.

In the next step,

I will skip out a big part when it comes to predictive modeling,

that is a variable selection.

I assume that I have no prior knowledge, so just add into all variables

that are available.

I have no idea which model is going to work best.

What I'm going to do here is I'll use the model screening platform

that compares several different machine learning models

that are implemented in JMP and just compare step performance on this

specific task.

Again, my target variable is the species.

This is the one that I would like to predict.

In total, I have 209 measurements

extracted from my footprint data and these are my X variables.

These are all factors that can potentially be used.

What you see down here is you can choose the method that you would like to run

and you can basically choose through all the prediction methods.

But for argument's sake and for runtime,

I'll only run methods that I know that will run through quickly.

You can again make it reproducible by setting a random seed.

What's also good about a model screening platform

is you can add an internal validation step.

We already split our data into training and test set.

But in order to have an internal validation on these models

that I'm going to develop,

I'll add k-fold cross-validation.

I'll just put a tick in here so that there are several models

are evaluated using the k-fold cross-validation method.

Okay, if I just click quickly, go on run.

I'll have a summary outlook.

I'll see that four of my models

have been evaluated.

You'll get air square,

you'll get performance metrics for those several models,

and you can just say which one is the best.

You just select the dominant one.

You can look into the training or into the test set

which will also give you a misclassification rate.

You can see that misclassification rate for Bootstrap Forest was quite impressive.

It's almost 95 percent of the data was correctly classified

in the validation set.

For argument's sake, let's say, that the Bootstrap Forest,

the Decision Tree, and the Discriminant Analysis

were the three best models and I'm not sure which one is the best.

I can just run those three models as selected.

They'll pop up in their respective platform.

What I can do is I'll just

save the prediction formula into my Formula Depot.

I'll do this for the Decision Tree model, so I can close this.

I'll do this

for the Discriminant Analysis, which is done here.

Last but not least, I can do this for the Bootstrap Forest model

here.

What I have now is...

Someone I can close, I'm sorry.

I have three models in my Formula Depot.

I now want to evaluate how these models perform on new data.

Again, I go into my initial data table

and I select my test set,

so all the variables that have been randomly choosing

as a validation set.

I'll just quickly save this as Test Set.

In the next step, I can technically

open my Formula Depot and I can run all those models in the new data table.

I will run them on the test set.

I want to run all three of the models.

I can do a model comparison

where I run all the three models on the test set.

That's actually what I wanted to show.

You'll see down here, I hope you can see this.

Try to zoom in a little bit that

the misclassification rates of these models were actually quite low.

Test set is new data that has not been seen on the models

and there was a misclassification rate

for the Decision Tree model of 16 percent, while the Discriminant Analysis

and the Bootstrap Forest

only had a misclassification rate of 12 percent.

You have the highest air square value for the Bootstrap Forest,

and there are several other metrics that you want to look at.

For us, the most important metric is the prediction, right or wrong.

I usually look a lot at the misclassification rate.

But obviously, all the other metrics

can be used to evaluate and generate good models.

Last but not least,

we want to have a look of how these models

actually perform on the footprints that were just previously processed.

Again, we go back into our... Now, where is it?

in our Formula Depot and we'll run this

at four images that we previously

processed.

If we open this data table,

you can have a look at the prediction.

What we can see here. I'll make it a little bit more obviously.

So this was

the original data,

and if I just look into the species for the first model,

everything was predicted correctly.

The first model was the Decision Tree that predicted ASC for ASC,

Eurasian otter for Eurasian otter, smooth-coated for smooth-coated,

and same with the Discriminant Analysis and same with the Bootstrap Forest.

All three models actually predicted the right species to the images

that we have processed, and all models were also consistent

about their prediction when it comes to the Eurasian otter.

I'm doing my PhD mainly on the Eurasian otter species.

This was also something that I would have guessed as an expert

on this particular species footprint.

So it's quite consistent and it works really well.

Obviously, you can add more steps and you can fine tune

your classification rate even more if you look more into feature selection.

But I think what I wanted to show you with this demo is how easily a question

of what species it is can be answered using JMP.

I hope this was quite intuitive.

Let me draw some conclusions on that, some results.

What I really like about JMP is a great all- in- one solution.

I can extract biometric data from images within the FIT add-in.

I don't have to switch software to do the data analysis.

Directly, my extracted biometric data can be analyzed.

And not only analyzed in a descriptive way,

I can build classification models.

There's state- of- the art machine learning models implemented into that,

and so it's a one- and- all solution.

There's obviously other ways to do that as well,

but I just like the practicality.

For our particular research question,

so our otter species specific classification models at this stage,

how much data we have, they have performed very well.

They're able to protect the species

of single unknown otter footprints with high classification rate.

For instance, here

neural net, which I did not run in this example

because the runtime is a little bit longer,

had a misclassification rate of only 10 percent on the same test set.

It could even increase the classification accuracy of another two percent.

If you dive in deeper, you can even increase this a little bit more.

One thing that our experience from working with footprints in the field

increases the classification accuracy by quite a lot

is rather than working with single footprints

is to work with trails.

An animal, depending on where you are, depending on the substrate,

doesn't only leave a single footprint.

Without going too much into detail, a footprint can be a quite complex matter

because it variates a lot with the substrate that an animal

is moving through with the speed of an animal with the gate.

There can be quite a bit of noise and background variation.

If you work with multiple footprints and take an average of a prediction

instead of single footprints,

you can address this variation quite a bit more.

This comes especially important when you want to look at individual identification.

Last but not least,

JMP is also great to gain insights on the importance of variables.

There's several ways within JMP

that you can look into which are actually the variables

that are contributing the most to predictions.

You can either look at tree- based methods,

where you look at the classification trees and where the splits are done.

You can look into column contributions again for tree- based methods,

if you work with Bootstrap F orest, or XGBoost, or something like that.

You can see which columns are actually choosing how many times.

You could look into the prediction profiler

if you use the normal modeling platform within JMP.

You can technically have a look of what the prediction of your model,

how it's going to change if you change certain values.

Or you can do something like a Discriminant Analysis,

where you just follow

the F-ratios of how variables are selected.

This will then, again, give you a lot of insights

because it really depends on your question.

Do you want to have a prediction from an external advisor

of what species you're looking at?

Or do you want to give a guide for working in the field

of which measurements are worth looking into

when you're into field,

when you want to make the classification on the subject,

like just on expert knowledge?

Yeah, so that's basically it from my side.

I would like to thank all the contributing zoos and wildlife parks who allowed us

to work with their beautiful animals and share data with us.

Especially, I would like to thank Grace Yoxon from the IOSF,

the otter survival firm who got us into contact with many of them,

KHYS, our Karlsruhe House of Young Scientists

who actually funded this study,

and most importantly, Joseph Morgan from JMP,

who is very helpful when it comes to modeling and FIT.

Then he has been giving me advice

more than once when I get any JSL scripting issues.

Yeah, that's pretty much it from my side.

Thank you very much for listening so far, everyone.

I just really want to round up here.

I think that Sky, and Larissa, and Fred have outlined beautifully

the challenges we're facing in identifying these different species of otter

around the world and the way in which JMP can help us

classify them and bring some clarity to this picture.

Where are they and how many are there?

I'd like to just quickly talk about what's next.

How are we building on this?

One of the things we're doing is building

artificial intelligence into this picture so that it will allow us to

filter and sort a much greater volume of data as it comes in.

As both Fred and Larissa have said, we need more data.

Artificial intelligence has the potential,

which we've already exposed in some early training

and test and field data

as you can see at the bottom of this screen.

We're getting reasonably good accuracy in our initial trials with AI.

We think that it will never give us quite the resolution that JMP will give us.

What we're aiming to do is have an AI platform

which will be easy for citizen scientists to feed data into.

We'll have JMP as the top level classifier on that platform by integrating it.

But the key here really is that this cryptic ground evidence

left behind by otters and all the other species

is there for us to decode if we can find a way to do it.

It really is transformative in conservation

to be able to have a cheap and quick technique

to know where these endangered species are.

We're very optimistic that using a baseline AI classifier

with JMP as a final classifier, we'll be able to make that technique

not only deliver the data we need, but integrate people all over the world

as citizen scientists to be part of that.

We're really grateful to the JMP community for supporting us

through this whole journey.

We're constantly making new strides.

We know that there's interest in this community,

and we hope that they will join us

when our new mobile app comes out to even start collecting data themselves

and pushing this forward to where we want it to be,

which is a classifier

for all endangered species all over the world.

Here's to lots and lots of points on the map,

and thank you all for listening.

It's Otterly Confusing! Short-Clawed, Hairy-Nosed, Smooth-Coated or Eurasian? Just ask JMP! (2022-EU-45MP-1047)

Presenter