I'm a senior developer in the design of Experiments and Reliability group
here at JMP statistical Discovery.
T oday I have the privilege of telling you
about a very interesting project that I was able to be a part of
concerning classification of species and sex
within a small mammal group called Fishers
using the Footprint Identification Technique.
F ishers, I'll give you a quick image here,
so here's an example of a fisher.
To me it looks like a bit of a weasel or ferret- type animal.
I know that's definitely not the same species,
but they're a small mammal,
and we're particularly interested in fishers located in the Sierra Nevada,
as those are a federally endangered species.
Specifically, we'd like to be able to identify
the presence of females,
as the larger number of females indicates a very healthy population.
They're also vital to helping develop effective conservation strategies.
Now, the way we intend to do that
is use what's called the Footprint Identification Technique or FIT.
This has been made popular through wild track,
is a non-i nvasive method for identifying individuals
based on images of their tracks.
This is especially helpful
since you may not be able
to actually see a fisher in the wild or capture them,
but their tracks are everywhere so that should be helpful to identify them.
U sing JMP, we were able to create a technique
to distinguish fishers from a nearby species
known as Pacific martens,
as well as distinguish sexes within species.
T he way this works is we started with a data set
of around 160 something martens
and well over 300 fishers consisting of about 34 males and 27 females.
What they would then do is then, as you can see here on the track image,
they would identify seven landmark points, is what we call them,
and then from those,
we could then compute well over 120 something features
consisting of lengths, distances, angles, and areas.
What we would then do, is then using those features,
we would then feed that into a linear discrimination analysis,
which we could then use to discriminate among species
and then sex ID within species.
To help assess that fit, we split the data into 50% training,
and for the remaining 50%,
we evenly split roughly between validation and testing.
Prior to the modeling,
we also tried to look at the effect of track orientation,
so we would flip the left tracks horizontally to match the right,
and then also any potential bias from the observers.
These are people identifying landmark points,
so we wanted to check and make sure
that any variation there did not affect our outcomes.
T hankfully, both the orientation and the observer bias
did not have a significant effect on our outcomes.
W hat brought myself and my colleague Ryan into the project was they had...
I noticed that some of the tracks as they were classified,
seemed a little to have a little bit too much spread in them
to the point that maybe there was actually multiple individuals.
T he way they would collect this data
is there would be a little cage area out in the woods.
Fishers could easily go in and out,
and there was a track plate in the bottom that would capture their footprints,
and there was also little spurs that would capture a bit of their hair.
It didn't hurt the animal.
They had no idea what was going on.
W hat they would then do is take some samples of those hairs
and send them out for genetic testing,
which was a bit of a long and expensive process.
Now, because of the way things were sampled,
you might have a sampled hair that would identify the animals
as potentially, say, male, but what could have happened
was a male and a female might have gone in,
and you only cut hair from one of them,
so the tracks might indicate potentially multiple individuals,
whereas the genetics said there was only one.
W hat they wanted was a method to be able to,
a more data- driven method, if you will,
to identify potentially misclassified multiple individuals
that we could then exclude from our analysis
so that it wouldn't bias the results.
B efore we actually got into that procedure,
one of the things that we would do is use JMP's Predictor screen tool
to identify, for each response of interest,
what were some of the top predictors?
Notice for species and sex ID here?
There's actually a lot of common features
that I'll be able to distinguish between the two
or at least have a strong ability to help distinguish between the two.
Much more so with the species than the sex.
We've shown you what these variables look like over here,
so area one is the complete shaded region.
We've got some distances, V 16, V 15.
Y ou'll notice a lot of them have to do essentially with the size of the track.
We've got some big distances in there.
I'll get back to these in a second, but using those top features ,
let me get back to a full screen of that.
Us ing some of those top features,
we would then make a plot that looks like this.
T his is just plotting it by the individuals.
All the red ones here are females. All of these are males,
so already visually, you can tell why these are some of the top predictors.
Just visually, you can see those groupings,
clear groupings between the sex ID.
What we've identified with these arrows is you'll notice a big spread.
You've got a cluster here and here.
Got a little bit here and there, especially here and there.
This is what they were interested in, especially with the males,
because what this could be is we could have...
It could be the same male, just a lot of spread,
that's a bit unlikely.
We could have a male and a young male,
or we could have a male, and that's actually more of a female,
but we don't really know.
T hey wanted a more data- driven method to say,
is this something we should be concerned about?
Is that spread too much?
W hat we did is, we used a control chart, which is from industrial statistics.
We thought that was actually ideally suited because control charts
are built for identifying parts that are out of spec,
and so what we did is created a control chart for, here's females and males,
and notice they each have their own limits,
this is because there are potentially multiple tracks for each individual,
so we could get a sense of their spread on an individual by individual basis.
Y ou'll see, we flagged some individuals that might have too much spread.
This is an S- chart that stands for sigma.
W e're looking at the spread, if you will.
W e've got a couple of individuals that
maybe there's a bit too much spread in there,
so that could potentially mean that there might actually be multiple individuals.
O n that basis, we then excluded those individuals from the final analysis,
and speaking of the final analysis,
once we ran everything through the linear discriminant analysis,
what we found was, for distinguishing between species,
we only needed one feature, that is this V 16 right here.
I call it the distance between the thumb and maybe the middle finger or something.
Those are not formal biological terms.
Please don't quote me on that.
But just visually, that's what I see, so that's a big distance measure.
Using just that,
we were able to successfully distinguish between species
with 99% classification, successful classification rate,
we missed only four out of 500 tracks, so that is an incredible result.
For the sex ID within fishers.
Using just these two features, v 15 and V6 ,
which is a distance between what I call the thumb and the upper palm.
Again, not formal, biological terms.
By using those two, we got a successful classification rate of around 90%,
and most of the individuals that we misclassified,
were actually males misclassified as females.
In our interpretation, what that might mean is
they could have been actual females,
or maybe they could have also been young males.
In either case, both are strong indicators of family units
and thus potentially healthy growing populations.
T hat was our contribution to this project.
We hope it goes on to provide a significant impact
in conservation of the species.
If you have any other questions, I'll be around and meet the experts
and also the poster presentation session.
I'd be happy to answer them there.
Enjoy the rest of the summit.