Lies, More Lies, or Just Statistics? JMP FDE Will Be the Judge (2025-US-30MP-253...

Polygraph examinations play a very important role in such situations as event investigations and employment screenings. Polygraph examinations monitor psychological reactions to determine if an individual is being deceptive or telling the truth. Being able to determine if an individual is being deceptive with a non-biased approach is of utmost importance. Many of the scoring techniques used to determine deception in a polygraph test are subjective and dependent on the polygraph examiner.

So is it lies, more lies, or just statistics?

Join us to find out as we explore polygraph data using JMP Functional Data Explorer to make curve analysis a breeze. We also use the Fit Model platform for regression analysis and Predictive Modeling Partition platform to flush out the most important factors in predicting deception. Sorting out lies and statistics gets messy so Workflow Builder is used to keep track of all the data cleaning and analysis steps. Come join us to sort out the truth using JMP’s powerful analytics as our witness.

Welcome, everybody, and thanks for joining us. For some lies, more lies or just statistics, we're going to watch a Liar Liar movie today. I'm just kidding. A little bit of background of how we got here, I'm Clark Ledbetter, I'm a systems engineer, and Megan Pennington from StatSense and a statistician is with us as well.

To provide a little bit of background, I met Megan at Y-12 when I was covering federal government. If you're a fan of Oppenheimer, you'll know Y-12 produces all of our enriched U-235 uranium. We were chatting last year when we had an on site, and I was asking about security at Y-12. How do you interview and all this other stuff? Megan, you mentioned you guys use polygraph data.

Yeah, we have a polygraph for the background check.

I was like, I was amazed. I was like, polygraph data. Huh? That would be interesting to take a look at, to see if we could do something. Because I think this whole polygraph stuff is willy-nilly sketchy. Too much human intervention in here. Megan, what do we have for data?

You're not getting Y-12's polygraph data, but conveniently enough, I have a client who actually performs polygraphs, and so we're going to take a look at his data set today.

All right. Cool. Let me move to the next slide.

The whole point of a polygraph is trying to determine if someone is telling a lie by recording physiological symptoms, because the thought is, when there's stress in the body, you're going to see these spikes in these physiological symptoms. There are basic detection sensors that are required, like blood pressure, cardio sweating. You can add as many sensors as you would like. Things like detecting foot movement, arm movement, jaw clenching. It can get crazy with the setup.

We take a look at the setup.

This is a typical setup. This has six sensors. You'll see in our data set we had 12 sensors. You can see here you have a blood pressure cuff. You've got the things across the chest for breathing. You've got sweat sensors. There's a lot of variation when you go to set this up you got to think different sized people. Are you getting it on the same every time? Is it repeatable?

I think if I put my process engineering hat on, because that's where I came from, this is riddled with variation. Do these guys even know anything about measurement systems analysis, gage R&R?

Sadly, no.

No, sadly, no. You mentioned we have 12 sensors, so there's even what? Foot movement, seat movement. We got all sensors hooked up here.

The polygraph examiner is free to choose their own question technique from the very beginning of polygraph measurements to the relevant irrelevant technique is how they got started. Basically, the relevant questions are tied to the thing you're trying to figure out they're lying about. The irrelevant questions have nothing to do with that at all. The thought was, well, we'll see spikes on the relevant questions and not the irrelevant questions. That wasn't true. It's really flawed because people are human. They're anxious getting hooked up to all this equipment. You've got all this noise in the data.

They came along and developed the control question technique, which still uses relevant and irrelevant questions, but they add in control questions that they know the answers to, using that as a baseline compared to relevant and irrelevant questions.

Then there's another technique. Say someone murders somebody, they steal something. They'll use the concealed information technique because it pinpoints exactly what they're trying to figure out if the person's lying about. But great news, we have no murderers in this state.

All right. I was going to check. We're not putting any murders on the stand here.

Not today. The polygraph examiner can choose different techniques. They can mix and match and combine methods. It's all up to the polygraph examiner. In our data set we're going to be looking at the CQT technique.

CQT with 12 sensors. Shall we take a look at what that data looks like?

Let's take a look. This is a graphical output that comes from the software of all the sensors. Each sensor will have its own curve. You can see we've highlighted breathing cardio sweat. The green and gray bars are where the examiners ask questions. The gray and the green is where the person has actually answered the question.

There's a couple of ways you can analyze these curves in the software. One is an algorithm that comes standard in the software, but it's really black box. We don't know what the heck's going on in there. We don't know what they're using. Also, the examiner can make his own judgment based on trained eyeballs from their professional training.

With the eyeballs.

What they do is they go in there, and they look at the peaks and the valleys and make their own personal assessment based on formal training, whether that's significant or not.

The timing of these zones, the examiners do that also by eye?

Examiners actually hitting the space bar when they believe they've started the question when they've ended and when they believe the person has started answering and ending answering.

I'm scared. There's too much. I'm not putting my truth or lie in this guy's hands.

There's a lot of variability here.

How about the results? What do the results look like?

When him or her go through and score the data set, they give a rating -2 to 2 in increments of one, two being more severe. And you can see they actually asked these questions three different times in random order. They're trying to get some randomized fashion in there to help with some of the variability. Then they take these numbers, and they add them up to is it truth or lie based on these subjective numbers.

This is their own scoring?

This is their own scoring method based on their professional training.

Here's what I smell. I smell human error.

Absolutely. Then the algorithm I mentioned previously, this is the report you would see, and it indicates whether truth or lie. You can see they've got some multinomial and some probability and some fancy statistic words there. There's something going on behind the scenes, but we really don't know what it is.

Unknown. We have an examiner that's doing his own adjustment. I don't know. I think we can do better with some JMP Pro Tools to try to remove this human variation. What do you think?

I think I agree.

Alternative approach and JMP. What we wanted to do here on this slide is just walk through high level what the workflow is. Of course most analytics, all analytics should start with a clean and visualization. We'll take a look at some of the challenges we had to spend a lot of time, and actually this cleaning step actually after this visualization.

We have a lot of sensors like I mentioned. Which ones are significant? Which ones matter? We looked at the predictor screening platform to try to weed out.

I've been telling Megan about Functional Data Explorer and analyzing curve sensors. I think this tool is the best thing since sliced bread. I was sure this was going to be a home run. We're going to be able to figure something out as we get going, which turns out wasn't very helpful. This is one of the tools that we tried to go use. I'll talk a little bit more about that in a bit.

Since FTE busted up, we looked at model screening platform, which was really great because you can include tons of predictive models, and you let JMP decide which one is actually the best fit. Then once we had our best fit model, and it's important to mention, we had a control data set that had known truth and known lie questions, which is what we use in the model screening platform to determine our best model. Then we took that model and applied it to real data, which we did not know whether it was truth or lie.

At this step, that data never saw anything. We took our best model, threw it at the real data. Let's take a look at just high level, this workflow that we went through and some of the platforms, and then we can break out for a bit of a demo. Of course, we had to bring the data in. I'll talk a little bit more about the cleaning. I did leverage some distribution. Definitely distributions trying to understand the various charts. Megan mentioned we had three individuals, multiple charts. We scanned around there, but the meat of it was looking at these sensor trends and really what they look like in Graph Builder.

As I mentioned, we utilize the predictor screening platform just to try to get a feel for which ones are even significant and mattered. It's just a real simple look. We saw EA, cardio, foot and seat movement, were the most significant. Actually the sweat gland and the cardio seem to be really significant in literature across the board. They seem to be the big hitters. JMP agrees with that.

Even without knowing any of this stuff we found those signals, which is pretty encouraging. We didn't really have to know all those things.

Exactly.

Now, the Functional Data Explorer platform, if folks aren't aware of what this is, think of these curved sensors that we're getting. They're essentially just points. What this platform does is it fits essentially a spline. We call it a functional model, but it fits this spline to each one of those unique curves. The goal is to look at what some of these inputs, how do they change the entire curve versus what the examiner was doing. I'm going to take a peek. I think that's higher. The goal here was can we pick up a signal in the entire shape.

This is what this would look like at the end. You'll see this. This is the sweat gland signal over time. I'm toggling it between lie and truth. We're looking at that signal. Unfortunately, the signal, you don't see some repeated structure in here. It started to feel like this was noise. We'll do an example in the demo.

As we mentioned, FDE didn't take it take off as much as we had hoped. We looked at the model screening platform. As I mentioned, you can choose from a whole host of models. What really shook out to be significant was the boosted tree. And then you can see there's tons of analytical options in this platform. It's really awesome. We really honed in on the decision threshold because we were trying to decide, is it more important to have someone who lied that told the truth or someone that was a truth teller that lied? We're looking at those false positives and false negatives.

We thought we couldn't really decide which one was more important than the other. We wanted to choose a decision threshold that minimized both. That actually ended up being 0.55. You can see there on the red and green, for the graphs, the red is the misclassification and the green is where we got it correct. Once we had this model, we actually fed it onto our real data set to validate.

Why don't we drop out the JMP and run through some highlights of this workflow? Let me open up. JMP. Here's the raw data table. Remember first step, we wanted to go and do some visualization. I'm going to open up Graph Builder here. I have a column switcher that switches through all these sensors. Just to start with we were kind of just scanning through these signals just to get an idea, do we see some that have unique structure. Does it look like noise? That type of thing.

There was one signal. We mentioned the sweat gland. If I stop this and go to the sweat gland, and you'll get an idea here, this local data filter, you have your three individuals and your three charts. It's interesting to try to figure out in time. The examiner started the test and was running in time through here. We had to figure out what was the sequence in order of these.

Then the key challenge was breaking up, finding start stop for each one of these sensors. This is with the Functional Data Explorer platform at the forefront. If we're going to use this, I need to chunk these curves up and get them all onto the same x-axis. We had a lot of cleanup work there. You also notice if I bring in, this trend here starts to get me wondering about the metrology. Something's going on. Something's drifting.

Let's take a look. If we add individuals. We start to see different means and even different behaviors. I'm suspect of the actual measurement system here and how this is going. This is a signal that we fed into FDE to see if it would work. Let's actually look at the FDE platform.

We tried our best. We looked at all these different signals in this platform. Then what I really wanted to do is let's do the sweat gland one from ground up, so folks get to see what that tool does and see what we ran up against. I'm going to do that one from ground up. I'm going to go to analyze specialize modeling and then Functional Data Explorer.

Now, my output is my sensor. I'm going to go grab the raw sweat gland, X input is our time. Function ID is the run ID that. Let JMP know each unique curve that we had to go slice up supplementary variables. We want to see whether truth or lie impacts that curve. Anything else we should add?

I think individual.

Probably should. It's a random effect.

Exactly.

Let me go add individual in here. Then I'm going to click okay. Right out of the gate you have to go through a visualization in this platform. You'll see this is this baseline shift that we see. You don't see a lot of signal as you start looking through each raw one. We tried a lot of different transformation and cleanup tools to the point of exhaustion.

For instance, over here in this spectral one we did a baseline correction. We added a linear fit to that. You'll see what the impact does. We are able… Everything moves down to a common scale. You start to see some features and signals show up. I think we even tried like a single normal variant here as well. You start to see signals, but the concern is what… We're not even close to what the original signal was. We just keep moving further away.

This starts to look like noise. If I get back, and take a look at actually my EA raw only profiler. If I blow this up after much time and effort, close that one down. You'll see here. Here's my EA raw signal, a sweat gland over time, I can toggle between truth and lie. Then as I toggle between individuals, I start not really picking up unique features in this.

Just for comparison, I'm going to show you a good FDE, just so you have an idea of what this probably should look like. This is with aluminum stress strain. I'm only going to show the profiler here. This example was born of a designed experiment studying stress strain of aluminum alloys.

This one's how we should look. They had some lot to lot variation looking at a temperature and then different specimen types. You can see as I toggle this my stress strain is repeatable. I see unique features. I come down here, I can start to see a unique feature that looks different. We just weren't getting that with the FDE. Even went so far as to export out FPSs and use the predictive modeling tools. It didn't yield a good model. We tried. Megan, I thought it was going to be the best thing since sliced bread. So I did.

We did. We spent a lot of time-

Where do we go from here?

I think we utilize JMP's model screening platform.

All right. Let's do that. I'll let you share.

Awesome. Thanks, Clark. Like you said, we ended up going with the model screening platform. We ended up using the top four inputs we found from the predictor screening, which was cardio, EA, seat movement and foot movement. I'll show you how you would go set this up. You go to predictive modeling model screening.

We're trying to determine truth or lie. Then we're going to look at those four sensor inputs that actually showed to have some significance. Raw seat movement, foot movement, cardio. Perfect. You can see over here down here by method of all the options you have. We looked at everything. For this, I would just set up the boosted tree. Then we actually did some k-fold validation there as well. This takes time to run. I'm going to go over here. I've already got this ran to save us some time. Here we go.

Like I mentioned earlier, there's tons of analytical analysis techniques, things that come out of this platform. Really we focused in on this decision threshold. Like I had mentioned, we were trying to minimize the amount of red, as you can see, as I move along the probability threshold for truth, that changes. We wanted to go exactly where the two curves intersected there, which minimizes the number of misclassifications false positive or false negative.

We went in here and used 0.55. Once we did that we were able to export out the prediction formulas which you would go to the red triangle, you would go over here to save columns, save prediction formula. Then this is what we brought into the real data set. All right Clark, you want-

Let me transition back here, I'll share again. This is the actual real data, and you'll see we have four questions. It was a still a similar flow. Three different charts, correct Megan?

Yes.

They went through we did that. They reordered that. We brought those formulas over. That's at the tail end of the data table. Here's our formulas. Then here's our most likely classification. Then from that debated what's going to be the easiest way just to classify and our A, B and C individuals. Are they telling the truth or are they lying?

I pretty much just did a simple tally where we had each individual and then each question. We are tallying within question within individual. If we go take a look at that, let me get back out to our presentation. The key question is who's the Pinocchio. Here's what we ended up with as just a straightforward assessment of these three individuals, brand-new data.

Remember this threshold, this is a ratio of lie to truth. As you approach over this 50% threshold, the likelihood starts to turn that you are lying. Megan, here's what my conclusion is. I got A, is a truth teller and I got B and C as liars. Am I right? Were we able to predict this right?

Right. We matched the examiners and the algorithms from the software.

We did it with no human judgment.

Absolutely.

It's what I call. If we wrap this whole thing up, I think that's one of the biggest takeaways. It's not just lies but it is statistics. Using these methods in the Pro Tools, if you can remove all that human aspect and judgment out of that, I would maybe sign up for a polygraph. As is, I wouldn't do that. What do you say on the flip side of this, Megan?

The other benefit is the analysis time. It takes him days, and we reduced it to minutes, which puts more money in our pockets. We all like more money, so can't argue with that.

Oh, absolutely. As we stopped from here at this point, we saw current state with the current testing method and algorithm. How do we improve it? What would be our next steps to improving this?

The next step is a DOE, we got to set up with different individuals, different sensor setups, different configurations. Basically, also looking at the time that the examiner believes, he takes him to ask the question and the individual to answer the question. Those gray and green areas that he's looking in. There's a lot of variability there. We would look at that as well.

We're back full circle? All this predictive modeling, and we're back to design of experiments. To me, I don't know I'm a process engineer. It's where we should have started. Yeah, we should go do this. To DOE or not to DOE that's not a question. We covered that in last year discovery. I think that's it, wrapped up for today. We're going to open it up for questions.

Presented At Discovery Summit 2025

Presenters

Skill level

Intermediate

Beginner
Intermediate
Advanced

Files

StatSense LLC Discovery Austin 2025_post.pptx

Lies, More Lies, or Just Statistics? JMP FDE Will Be the Judge (2025-US-30MP-2539)

Presenters

Skill level

Files

Advanced Statistical Modeling

Automation and Scripting