Thank you for joining us.
The title of our talk today
is It's Otterly confusing! Short-C lawed, Hairy-N osed, Smooth-C oated, or Eurasian?
Just ask JMP!
These four species occur in same habitat in Asia, in Eurasia,
and the question is, how do you tell them apart?
Using our FIT technique, the footprint identification technique
developed by WildTrack,
we can actually tell them apart.
The presentation will be by four people:
Fred Kistner, who is at the Karlsruhe Institute of Technology in Germany
and also a member of the WildTrack Specialist Group;
Larissa Slaney, who is a PhD candidate at Heriot- Watt University,
FIT Cheetahs Research Project
and also member of the WildTrack Specialist Group.
Zoe Jewell and myself, I'm Sky Alibhai, faculty at Duke University
and founders of WildTrack and developers of the footprint identification technique
and also members of the WildTrack Specialist Group.
I'm going to do a very brief demo, a very, very brief introduction,
and then I'll hand over to our next speaker.
Apart from ultra species that we work on,
WildTrack has an extensive number of eminent species that we work on
in different parts of the world, ranging from the Amur Tiger in China
to black rhino in Africa to jaguar in Brazil.
All of them utilizing in one form or another
are footprint identification technology.
Now what does footprint identification technology actually do?
One of the things about the footprint identification technology, FIT,
is that it works as an add-in in JMP.
It's designed to classify species or subspecies
by using the metrics from the footprints,
classify sex, classify age class, and even classify individuals.
Now those are all elements that are required to understand
the population dynamics of any endangered species,
the essential foundation elements.
The conservation applications of footprint identification technique,
the baseline data on numbers and distribution in form on:
data- driven scientific conservation strategies,
trade in endangered species,
human/ animal conflict mitigation,
and all these in a way will be shown
to represent the way in which otter conservation works.
Now I'll hand you over to Larissa Slaney, who will start the process of deconfusion.
Right. Thank you very much, Sky, for this great introduction.
Thank you to JMP, for allowing us to present our research here.
We're very pleased about that.
Thank you so much for all of you to be here
and show this interest in our research.
Now before we're going to jump, pun intended,
into explaining our data analysis with JMP,
I would like to give you some background to this research.
We think it is really important to look at this in context
because it's not just about what JMP can do,
but how this is applied in the real world and how we scientists can use it to...
It gives us the opportunity to make changes for the better, basically.
Our research looks at footprint analysis as Sky already just said.
We are looking at the footprints of Asian otter species.
Now here you can see four different Asian otter species
and they're all classified as vulnerable or critically- endangered by the IUCN,
and the home ranges overlapped,
so that's so- called sympatric otter species.
On the top left, you can see the smooth-c oated otter.
Top right, you can see the Asian short or small- clawed otter,
bottom left, the hairy- nosed otter,
and at the bottom right, you can see the Eurasian otter.
Now, to be able to monitor the different species,
we need to collect data, which is difficult
with such very elusive species.
You hardly ever see them in the wild, but you do see their footprints.
Looking at these footprints for each species,
when you look at them here, they look very similar, don't they?
Now, it's quite tricky to tell them apart.
Therefore, we set us the task to find out whether
there is a way for FIT to distinguish between the footprints
of these four otter species in a scientific and reliable way.
Now, we do have an added problem here, because the front footprints
of two of the species have a size overlap
and the hind footprint of another species
is morphologically very similar,
and also size- wise quite similar to the front foot of another species.
We've got a multi-class classification problem here.
Now here you can see a map which shows the distribution ranges
of the different otter species.
The blue area here, that's a Eurasian otter.
Then the red area is the smooth- coated otter.
The yellow down here is the small- clawed o tter,
and then the pink over here is the hairy-nosed otter.
But you can see it, although it looks like a large area,
there's actually just a few dotted islands there.
What is really interesting about this map, though,
it shows you where their home ranges overlap.
There are six areas where
at least two, if not even three of the species overlap.
For conservationists, it's really important to find out
where the different species live, to what extent ,
how large the populations are, and find out as much as possible about
the different populations
so that we have a good idea of how endangered they are.
Now, why is otter conservation important?
Well, first of all, otters are classed as Keystone species,
and that means that they have an effect on their environment
disproportionate to their abundance.
Just a few individuals can have quite a big impact.
They play a really important part
in the food chain and contribute to the environmental equilibrium.
They're also seen as Umbrella species,
which means that they confer protection to a large number of other species.
Basically, if something happens to the otters,
that will have an impact on other species as well.
They're also an Indicator species,
so they actually indicate the health of their environment.
They will not live in polluted waters or in polluted wetlands.
Otter returning to an area is always a good sign,
because that means water quality and wetland health is improving.
Now, threats to otters. There are lots of different threats.
Pollution is the one we've already mentioned just now.
But another problem is the human wildlife conflict, habitat loss.
With that also comes loss of prey.
A n increasing problem, especially in Asia,
is the illegal trade, the illegal wildlife trade.
They are particularly after the fur, the fur trade and also after pets.
Baby otters are taken out of the wild and used as pets,
which is not good for otter conservation at all.
how do you approach a conservation project like this?
Well, first of all, you need to think
about how do you want to monitor the population.
What do you want to look at?
Do you want to look at species distribution?
Yes, almost always.
Do you want to look at individual ID?
Do you want to find out what the sex ratio within a population is?
The next thing you need to decide is do you want to use invasive methods
that potentially stress or even harm the animals?
Or do you want to use non- invasive technologies or methods
to monitor the species which will not stress and harm the animals?
We, at WildTrack,
we focus on non-invasive ways to monitor species.
In this particular project,
we are completely focusing on footprint identification.
Now, once you have made those decisions,
you need to obviously train people to help you with the data collection
because you can't be everywhere and you can't go everywhere.
During times of pandemics, it's even more difficult.
So you need to train your team both in- person as well as remotely
so that has been a bit of a challenge.
Then you need to get all the data collection,
and the training and the data collection can happen in-situ,
which means in the field or ex- situ, which means in zoos
and other conservation organizations.
Once you get the data in, that is when you start the data analysis.
In our case, that's when we start using JMP.
Other typical issues for any conservation project is funding, of course, and also
trying to get conservation policies improved,
and management strategies for conservation to have those improved.
That's basically our end goal. We are collecting all the data.
We are analyzing all the data so that at the end of the day,
we can give that information to governments
or other organizations and they can make an informed decision
and make better conservation policies.
Let me just go back one more time.
On the left hand side here, actually,
you can see one of our lovely zookeepers collecting footprints for us.
On the right hand side, you can see
a footprint image that was sent to us from the wild.
That's a mystery footprint.
We were asked if we could please find out
which species left that footprint behind.
That's really what we want,
that researchers start to send us footprints
and we can help them find out which species lives in their area.
Fred will, hopefully later on, help us reveal
which species this footprint belongs to.
Now, we've asked ourselves three research questions,
and at the moment we are still focusing on one.
This is an ongoing project.
A t the moment, we are focusing on species classification.
Can FIT, the footprint identification technology,
identify or distinguish
between the four different species of otter we are looking at?
When we've got enough data and enough particular data,
where we definitely know the individuals,
we will look at individual classification and also at sex classification.
But that's going to be a bit further down the road.
So far we have teamed up with nine zoos and otter conservation organizations.
We've been training them to collect footprints following our FIT protocol.
This has been, again, during COVID, quite challenging.
I've not been able to see everybody in person,
so some people I've had to train remotely,
but they've all been absolutely fantastic, our zoos and zookeepers,
and have really risen to this challenge
and have started to really send in a lot of images
as you can see here on the left.
It's still overall much smaller sample size than we want to have.
As I said it's an ongoing project, but it
is enough to give us
the ability to now to share some preliminary results with you
so we can draw some conclusions.
We've included three otter species in this so far.
We've only just started to begin to get h airy- nosed otter prints.
There's only one h airy-nosed otter incaptivity in the whole wide world.
His or her, I'm not sure, prints are just starting to come in,
and we will update at a later time
the results with this fourth species in it.
But for now, we're going to look at three otter species.
Yes, so I think it's time to have a closer look at how we do the data analysis
over to Fred.
Thank you, Larissa, and let's jump straight into action.
Like Sky mentioned previously,
FIT has been developed for a wide number of species.
When it's fully developed
or after leaving production, it's an add-in into JMP.
Today, I'm going to demonstrate some parts of the data analysis
and some parts of the development before it comes into production.
What I am going to say is, in general, I just wanted to give you
a little bit of a background of how this development is usually done.
Our input data is collected
with very little equipment and very simple equipment.
That's one of the main advantages of FIT,
that it can be widely applied with very little equipment.
You only need a smartphone and a ruler.
If you want to develop FIT models for certain species,
you start with an image database that is usually collected
of known individuals as Larissa mentioned.
We therefore cooperate with zoos and other wildlife centers.
These images are then processed within JMP to extract geometric profiles
that extract a lot of measurements, angles, distances.
This data can then be used to develop FIT models.
The general output is that you want to look at species, sex, individuals.
If you're able to edit for individuals, you want to draw conclusions
about population size.
Once you develop the method,
you definitely want to test this on unknown individuals.
Again, you look at images and get a prediction of the models.
Advantages of FIT based on biometric, it's non-invasive.
It's a standardized and cost effective way to monitor elusive wildlife
that cannot be monitored by direct observation.
It can be implemented for almost any species that leaves a footprint.
It can be combined with other non- invasive methods
and cross- validated models generally have a high accuracy.
How to build these models is something that I would like to demo.
What I'm going to demo today is technically looking
at different footprints.
You see on the top left, you see a hind foot
of an Asian small-clawed otter.
On the top right, you see a left front of a smooth-coated otter,
on the bottom left, a left front of a Eurasian otter,
and on the right hand side, you see a right foot footprint
from an unknown otter from Nepal.
What we are going to do today is we process these images.
Then I'm going to show you how to quickly develop
a classification model within JMP.
Then I'll see what the predictions
of these quickly develop methods are going to be.
It all starts
with image analysis.
That's script-based implementation within FIT.
In the first step, you usually adjust the size of an image
so that the footprint is clearly visible and the dominating part of the field.
In order to be replicable,
it's important that footprints are aligned following defined rotation points.
For otters, these are rotation points below the second and the fourth toe.
Then you set a defined set of landmark points.
Again, for otters, this is species- specific.
But for others, I've chosen 11 landmarks.
They're in the center.
Sorry, forgot one step.
Of course, you need to define a scale first.
Here we got 10 centimeters. This is up here.
You can add some additional information.
Just to keep it simple, I will name th is strike Asian short-clawed otter.
Then you set 11 landmark points.
You could, for instance, use a cost air function
if you want to make this as precise as possible, obviously,
but for time reason, I'll just quickly run through them.
After setting 11 landmarks, you derive additional points,
which are helping points that are also used
to extract biometric information.
Once you've done that,
you'll just start a new table and you go for a pen draw.
I'll just quickly run through three more images.
Again, you need to resize them.
Now, with this image, you can see it's upside down.
What I like about JMP is that
the image window can actually do some image pre- processing.
Now, it's right front. Sorry, I need to flip this one more time.
Can do some image processing within JMP and so you don't have to change
in between software.
That's something that I really like that I can do all my work
within one software other than switching in between several software.
Again, I said 11 landmark points.
This time, I'll just go over them quick and dirty,
and hope that the prediction will be accurate enough
that's a Eurasian otter.
Again, I go append
just two more times, one for the smooth-coated otter.
Again, I will set the 11 landmarks,
and what the landmarks are used for, I'll show you in a second.
Derive points, append row.
One last time, the mystery footprint that Larissa mentioned
that was sent to us from a project in Nepal,
that is, to my knowledge, doing some otter monitoring there.
One of the species has not been seen for at least 30 years.
1, 2, 3, 4, 5, 6 .
What's different in here is that you have a different scale.
That scale factor is something that I need to adjust within here.
Again, I'll quickly click through the images.
This is normally done a little bit more tedious,
but for demo sake, I'll try to click through them quickly.
And this is an unknown
What you end up with this was the smooth-coated.
What you end up is
a big data table .
These are points for
evaluating the quality of the landmark, which I did not go into within here.
You get X and Y coordinates for each landmark.
These X and Y coordinates are derived
to calculate a large number of measurements.
There's more than 100 distances derived, some angles and some areas.
There's quite a lot of information extracted out of a single footprint.
If you repeat this step that I've just shown several times,
you'll end up with a data table like this.
This is the data table that I'm going to demo the prediction model on.
If you have a look what we have here,
if you look at the distributions or species or target variable,
you see that I have 405 processed images of Eurasian otters,
278 Asian short-clawed otters, and 127 smooth-coated otters.
It's not perfectly equally- distributed groups,
but at least each group has quite a significant sample size,
which will hopefully work for modeling.
Whenever you want to do any sort of supervised modeling,
it's a good idea to split your data into training and test data.
This can be very easily done in JMP.
You have to make validation column within the predictive modeling platform.
What I've done is I randomly split my data into 80 percent training data
and 20 percent test data,
where I will test the models that we're going to build on
and see how they perform.
All right, so I've previously done this.
What I'll do now is I'll just select my training data,
which are 648 rows,
and I will just have a look into a data view.
This is 648 observations.
I'll quickly save this as my training set.
Again, if you have a look at the distribution,
you could see that we have,
100 smooth-coated otter prints, 324 Eurasian otter prints, and 223 ASC.
It's the same distribution percentages as with the previous data set.
In the next step,
I will skip out a big part when it comes to predictive modeling,
that is a variable selection.
I assume that I have no prior knowledge, so just add into all variables
that are available.
I have no idea which model is going to work best.
What I'm going to do here is I'll use the model screening platform
that compares several different machine learning models
that are implemented in JMP and just compare step performance on this
Again, my target variable is the species.
This is the one that I would like to predict.
In total, I have 209 measurements
extracted from my footprint data and these are my X variables.
These are all factors that can potentially be used.
What you see down here is you can choose the method that you would like to run
and you can basically choose through all the prediction methods.
But for argument's sake and for runtime,
I'll only run methods that I know that will run through quickly.
You can again make it reproducible by setting a random seed.
What's also good about a model screening platform
is you can add an internal validation step.
We already split our data into training and test set.
But in order to have an internal validation on these models
that I'm going to develop,
I'll add k-fold cross-validation.
I'll just put a tick in here so that there are several models
are evaluated using the k-fold cross-validation method.
Okay, if I just click quickly, go on run.
I'll have a summary outlook.
I'll see that four of my models
have been evaluated.
You'll get air square,
you'll get performance metrics for those several models,
and you can just say which one is the best.
You just select the dominant one.
You can look into the training or into the test set
which will also give you a misclassification rate.
You can see that misclassification rate for Bootstrap Forest was quite impressive.
It's almost 95 percent of the data was correctly classified
in the validation set.
For argument's sake, let's say, that the Bootstrap Forest,
the Decision Tree, and the Discriminant Analysis
were the three best models and I'm not sure which one is the best.
I can just run those three models as selected.
They'll pop up in their respective platform.
What I can do is I'll just
save the prediction formula into my Formula Depot.
I'll do this for the Decision Tree model, so I can close this.
I'll do this
for the Discriminant Analysis, which is done here.
Last but not least, I can do this for the Bootstrap Forest model
What I have now is...
Someone I can close, I'm sorry.
I have three models in my Formula Depot.
I now want to evaluate how these models perform on new data.
Again, I go into my initial data table
and I select my test set,
so all the variables that have been randomly choosing
as a validation set.
I'll just quickly save this as Test Set.
In the next step, I can technically
open my Formula Depot and I can run all those models in the new data table.
I will run them on the test set.
I want to run all three of the models.
I can do a model comparison
where I run all the three models on the test set.
That's actually what I wanted to show.
You'll see down here, I hope you can see this.
Try to zoom in a little bit that
the misclassification rates of these models were actually quite low.
Test set is new data that has not been seen on the models
and there was a misclassification rate
for the Decision Tree model of 16 percent, while the Discriminant Analysis
and the Bootstrap Forest
only had a misclassification rate of 12 percent.
You have the highest air square value for the Bootstrap Forest,
and there are several other metrics that you want to look at.
For us, the most important metric is the prediction, right or wrong.
I usually look a lot at the misclassification rate.
But obviously, all the other metrics
can be used to evaluate and generate good models.
Last but not least,
we want to have a look of how these models
actually perform on the footprints that were just previously processed.
Again, we go back into our... Now, where is it?
in our Formula Depot and we'll run this
at four images that we previously
If we open this data table,
you can have a look at the prediction.
What we can see here. I'll make it a little bit more obviously.
So this was
the original data,
and if I just look into the species for the first model,
everything was predicted correctly.
The first model was the Decision Tree that predicted ASC for ASC,
Eurasian otter for Eurasian otter, smooth-coated for smooth-coated,
and same with the Discriminant Analysis and same with the Bootstrap Forest.
All three models actually predicted the right species to the images
that we have processed, and all models were also consistent
about their prediction when it comes to the Eurasian otter.
I'm doing my PhD mainly on the Eurasian otter species.
This was also something that I would have guessed as an expert
on this particular species footprint.
So it's quite consistent and it works really well.
Obviously, you can add more steps and you can fine tune
your classification rate even more if you look more into feature selection.
But I think what I wanted to show you with this demo is how easily a question
of what species it is can be answered using JMP.
I hope this was quite intuitive.
Let me draw some conclusions on that, some results.
What I really like about JMP is a great all- in- one solution.
I can extract biometric data from images within the FIT add-in.
I don't have to switch software to do the data analysis.
Directly, my extracted biometric data can be analyzed.
And not only analyzed in a descriptive way,
I can build classification models.
There's state- of- the art machine learning models implemented into that,
and so it's a one- and- all solution.
There's obviously other ways to do that as well,
but I just like the practicality.
For our particular research question,
so our otter species specific classification models at this stage,
how much data we have, they have performed very well.
They're able to protect the species
of single unknown otter footprints with high classification rate.
For instance, here
neural net, which I did not run in this example
because the runtime is a little bit longer,
had a misclassification rate of only 10 percent on the same test set.
It could even increase the classification accuracy of another two percent.
If you dive in deeper, you can even increase this a little bit more.
One thing that our experience from working with footprints in the field
increases the classification accuracy by quite a lot
is rather than working with single footprints
is to work with trails.
An animal, depending on where you are, depending on the substrate,
doesn't only leave a single footprint.
Without going too much into detail, a footprint can be a quite complex matter
because it variates a lot with the substrate that an animal
is moving through with the speed of an animal with the gate.
There can be quite a bit of noise and background variation.
If you work with multiple footprints and take an average of a prediction
instead of single footprints,
you can address this variation quite a bit more.
This comes especially important when you want to look at individual identification.
Last but not least,
JMP is also great to gain insights on the importance of variables.
There's several ways within JMP
that you can look into which are actually the variables
that are contributing the most to predictions.
You can either look at tree- based methods,
where you look at the classification trees and where the splits are done.
You can look into column contributions again for tree- based methods,
if you work with Bootstrap F orest, or XGBoost, or something like that.
You can see which columns are actually choosing how many times.
You could look into the prediction profiler
if you use the normal modeling platform within JMP.
You can technically have a look of what the prediction of your model,
how it's going to change if you change certain values.
Or you can do something like a Discriminant Analysis,
where you just follow
the F-ratios of how variables are selected.
This will then, again, give you a lot of insights
because it really depends on your question.
Do you want to have a prediction from an external advisor
of what species you're looking at?
Or do you want to give a guide for working in the field
of which measurements are worth looking into
when you're into field,
when you want to make the classification on the subject,
like just on expert knowledge?
Yeah, so that's basically it from my side.
I would like to thank all the contributing zoos and wildlife parks who allowed us
to work with their beautiful animals and share data with us.
Especially, I would like to thank Grace Yoxon from the IOSF,
the otter survival firm who got us into contact with many of them,
KHYS, our Karlsruhe House of Young Scientists
who actually funded this study,
and most importantly, Joseph Morgan from JMP,
who is very helpful when it comes to modeling and FIT.
Then he has been giving me advice
more than once when I get any JSL scripting issues.
Yeah, that's pretty much it from my side.
Thank you very much for listening so far, everyone.
I just really want to round up here.
I think that Sky, and Larissa, and Fred have outlined beautifully
the challenges we're facing in identifying these different species of otter
around the world and the way in which JMP can help us
classify them and bring some clarity to this picture.
Where are they and how many are there?
I'd like to just quickly talk about what's next.
How are we building on this?
One of the things we're doing is building
artificial intelligence into this picture so that it will allow us to
filter and sort a much greater volume of data as it comes in.
As both Fred and Larissa have said, we need more data.
Artificial intelligence has the potential,
which we've already exposed in some early training
and test and field data
as you can see at the bottom of this screen.
We're getting reasonably good accuracy in our initial trials with AI.
We think that it will never give us quite the resolution that JMP will give us.
What we're aiming to do is have an AI platform
which will be easy for citizen scientists to feed data into.
We'll have JMP as the top level classifier on that platform by integrating it.
But the key here really is that this cryptic ground evidence
left behind by otters and all the other species
is there for us to decode if we can find a way to do it.
It really is transformative in conservation
to be able to have a cheap and quick technique
to know where these endangered species are.
We're very optimistic that using a baseline AI classifier
with JMP as a final classifier, we'll be able to make that technique
not only deliver the data we need, but integrate people all over the world
as citizen scientists to be part of that.
We're really grateful to the JMP community for supporting us
through this whole journey.
We're constantly making new strides.
We know that there's interest in this community,
and we hope that they will join us
when our new mobile app comes out to even start collecting data themselves
and pushing this forward to where we want it to be,
which is a classifier
for all endangered species all over the world.
Here's to lots and lots of points on the map,
and thank you all for listening.