Things I Wish I Knew When I Started Using JMP (Episode 3)

According to JMP Systems Engineer it’s the first thing you should do when handed a new dataset – graph your data. In this episode of Things I Wish I Knew When I Started Using JMP, System Engineer Mike Anderson shows you where to find and how to use the Analyze Menu in JMP. Learn how the Analyze Menu can help you understand the basics of your data and variables (Distribution and Fit Y by X); discover more complex questions including regression (Fit Model) and Machine Learning and Artificial Intelligence Methods (Predictive Modeling, Specialized Modeling, Screening, Multivariate Methods and Clustering); and better understand process, quality and people (Consumer Research).

Full Transcript (Automatically Generated)

Morning, everyone. It's a beautiful Monday here in New York, the sun shining finally. And today we're going to be going over another of the things that I wish I had known when I started learning about jmp. Also something that I like to tell a lot of people when they're first getting started using jmp, I've been working with a lot of people that have been taking the demo downloads that we have lately. And this is one of the things I tell them on the first day. Because this can be a little bit confusing coming from different software's because the way jmp considers the Analyze menu is organized entirely differently than the way you would expect coming over from somebody else's software.

As always, I'm going to be taking your questions. So if you've got any questions, find that q&a window and at the end of the session, I'll try and answer as many of them as I can. I got one cut that came in over the weekend from actually from the first episode of this series, where they wanted to know how I did something and I'll be going through that as well, too. To show them how we did that, and we're going to dust off some fun and go into image analysis to do it.

Alright, so first thing, the Analyze menu. When I first started learning the Analyze menu, it really didn't make sense to me until and it actually didn't make sense to me until I started teaching other people to use the software. And then as I started kind of refining my idea of what the the analytical workflow is, a lot of the ideas that we talked about in this series came from those early experiments and how to teach jmp on my own. I started to understand how the Analyze menu is laid out and that there's a method to this apparent madness.

And it's there's a, what I want to share with you today is a general guideline about how to kind of guess where those things are that you're looking for. Kind of moving your guessing around in the menu from just wild guess to scientific, wild guess And we'll we'll leave the the middle of the layout for the sake of the recording. So let's get started. Let's look at let's, what I've got right here is a screen capture of the Analyze menu right here. So I can use my little laser pointer and point around. And let's take a second let's step back. When you're handed a brand new data set. What's the first thing you should do? Well, if you were here, for the first episode of this series, you would know the first thing you should do is graph your data. Well, if we make zero assumptions about our data, if we make zero assumptions about shape, about relationship about anything, the first type of graph that you probably would run into is something called a histogram.

And it makes no assumptions. It's great for understanding how data is shaped looking for outliers doing basic statistical tests. There's lots of things you can do with just a histogram. If we come under the Analyze menu, the first thing we run into is this platform called the distribution. And that graph that goes with that distribution. Remember from the second thing, we talked about the the jmp workflow, get a graph, look at the graph, ask questions about what you see and then push the answer button. The graph that comes with the annal with distribution platform, is a histogram. So that's if we want to just look at one thing now if you look at your data, and then you moved a little bit further down the kind of the, the chain of logic that you might be following or that the inquiry that you might be looking at, you might start proposing relationships.

I think a relates to be I think there's a correlation between A and B. Those are all possible things. And those by various relationships, we're starting to make inferences about whether or not something is a response or a factor whether or not it's an X or Y. In that case, we would come down to fit y by x. And there because of some, some clever coding, a lot of different plot, a lot of different comparisons are available all of them comparing one variable to another, most of the time making an assumption that one of them is response, and one of them is a factor.

So it would seem at first glance, at first glance, that the Analyze menu is organized by the number of variables you're trying to compare. But then we get to tabulate and then we get to text Explorer, and that that logic kind of breaks down. So let's let's skip those two for just a minute. And let's move down to fit model and what we're doing Here as we're proposing more complex relationships, it's multivariate Yes, but we're composing, where we're proposing different levels of complexity in our models. And that's the trick. That's the key. The jmp analyze menu, at least, from what I've seen, and the way that I, the way that I teach it is that it's organized as you go down the menu. You're asking more and more complex more and more specialized problems. Which is why down here at the bottom, we have people consumer research, because let's be honest, people are generally kind of nutty, in very difficult to predict things.

And because of that the data associated with people is invariably very complex and very hard to understand. So we've got some very specialized tools in that platform down there in that subgroup down there at the bottom, to help us navigate those complexities that come with the human condition. And so let's look at look at this a little bit down in the middle here. Let me bring up a little bit of an antic annotated version of this. So we talked about one variable, two variables, fit models regression. In here we have machine learning and AI type models. We also have multivariate platform, screening is kind of the odd man out, except it under the hood, a few of the screening platforms use machine learning technology so that that makes some sense as to why they're there. Then we get into prop quality and process. So that's a more specialized tool. And like I said, people down at the bottom, because it's incredibly complex data.

Now, let's scoot back up to this transfer to the tabulate and text Explorer. And this is a problem in the scheme, because those don't necessarily make sense for them to be right there. Unless you understand that tabulate one of the roles that tabulate has, is to transform data from one form to another. It's good for doing cross tabs, it's good for doing well not so much cross tabs, pivot tables and things like that. So its goal is to do summary to reshape data so that other parts of the software can take the inputs to transformative tool, text explores the same way. A little cue about the role of a platform in jmp is when we put explorer at the back of something. One of the goals of the tool, one of the goals of the platform is to transform data from something in this case text which is really tricky to work with, to something that we can use for fit model that we can use for doing sentiment analysis that we can use for doing all these different things. So those two bridging tools, those two those To platforms there are bridging their goal is to transform from repair to fairly simply, fairly simple, simplistic type one, not simplistic, but fairly simple type analyses to more complex multivariate analyses that come along.

So let's take a quick let's take a second and do a pop quiz here. I'll leave that up delay to give everybody a cue, and you can follow along. Well, I guess I'm not gonna leave that. Let's start with a question. If you wanted to compare one variable to a target, where would you look in the Analyze menu? I'll give you a second to think about that. got an answer. Okay. one variable, that's the key. So we're going to look on where there is one variable, analyze distribution. What would we do right there? Well, let's just take some data. And let's take some data here that I've got this is just processed data that doesn't really doesn't matter. For the sake of the discussion, we'll just grab something. Click ok. We're in the distribution platform now awesome.

And if I follow my workflow, I'm gonna look at my data I'm gonna say I, you know, as normal enough horseshoes and hand grenades. We could, we could look at that and more in depth if we wanted to. But we come under here for this option that says test mean and I, I'm going to propose that my hypothesized mean is 30. And when I do that, I get down here at the bottom, my T test for the me. Okay. So that's part that's step one. Let's try the net. Let's try one more. Now, what if we wanted to get correlation coefficients? So we're comparing two variables. But we want to do it across all the continuous variables in a data set. Where would you go for that? Give me a second. Think about it again. Okay. If you said by variant, you could do that.

You could do that in by various. It's possible. For come under by under or excuse me, if you if you say fit y by x are saying I'm going to compare two variables. Yes, you're comparing two variables, but you're comparing two variables and then two variables, and then two variables, and then two variables, and then you've got all these common tutorials. combinations of two variables. So while you can do this, you know, let's just grab some, let's grab a bunch of continuous variables here. Again, we're not really worried about what's going on here. Let's just due to, you could come into by very, very easily, and come under the red triangle, and then come down to the density ellipse. And that will give you the correlation. I could do it there. That's true. But then you'd have to do it for F pairwise for every combination in this data set. If you've got 150 different combinations of 150 different continuous variables, you're going to be at that all day.

That's not what we want to do. So we come a little deeper, our problem is more complex. We're coming down further in the in the model, do we need to transform our data? Now we don't do we need a model. That doesn't sound quite right. modeling, modeling, screening, possibly, but multivariate methods, multivariate might be the right place to look for this. And again, there there may be if you're just starting out with jmp, there may be some, some hunting and pecking some a little bit of guessing around. But the purpose of this little trick is to kind of give you an idea about where you need to guess. So if I dump all my continuous variables in here, Wallah, I get all of my correlations. And I can even if I want to be fancy, I can put them on this scatterplot matrix in here. Just like that, and they're colored so it's easier to look at what's what's Important. Okay. All right. And that's that's the trick about the that's the trick for today. That's the thing we want to cover for today. And if you're interested, this particular graphic, it's up on the blog article that comes with this. It's look at the blog article for this, for this content today is linked in the jmp on the air segment for this.

So you can get to this article, you can get to that little graphic if you need some help at any time. All right. Now I want to come to and while I'm doing this question, if you have any questions, I've got the q&a open here. Go ahead and type them in the q&a. Instead of taking a two minute break. I'm going to cover a question that I got earlier in the week from someone that wanted to know how I did something in the first episode. But in the meantime, if you've got questions, feel free to type them in to that to the q&a and I'll keep an eye on that while I'm going. All right. The question I got was how Did I make this 3d graphic? And it wasn't the specifics about the 3d graphic itself? That's just scatterplot. 3d, that's easy to do. The question was, how did I get the data? out of my images? Which is a very interesting question. And the short answer is that on Windows, this is really easy to do. So let me pull over my windows enter in my password here before I pull it over.

Now what the problem is, so here's a here's jmp on Windows. On Mac, the workflows a little bit harder because of a there's some differences in in the underlying code. But the the trick the question here is how do I get Get if I have a series of when I'm opening if I have a series of images or TIFF images and I want to bring those into jmp, let me just open up all my show all my files here. And in this case I do I have my images set in here and what's called a multi file TIFF. So in this TIFF just like in a GIF, you know what, let's leave that let's leave the argument about GIF or GIF off the side for the moment. In this single file, there are multiple images, it's a movie more or less. This is really handy for for taking thicknesses of so if you're doing X ray tomography, you can set up the image stack in here and it will play through in certain graphic softwares or it's a great way just to keep everything together when you're doing image analysis. But the question is how do we get this data in? Well, in either case, john Ponty a few years back made a great tool called the image analyzer. And this on the Mac, or on the PC side, sorry, this, this makes this job really easy to do.

So I'm going to navigate, I'm going to go choose I'm going to grab my TIFF file here. And it's going to read all the bits attached to this image, spit it out as a data frame with 2 million rows, but also create some images so you can kind of see what's going on inside of this file. So on Windows easiest pi, go grab the image analyzer, it's super easy to do. on the Mac, you would have to do this the image analyzer add in is available but on the Mac It only pulls in the first image, it won't pull in a multi image stack. So you'll have to bring these in one at a time using that image analyzer and then concatenate them. Or if you're into the if you're in the process of doing image analysis, you probably know about a tool called image J. And this is what this is the way I did it. And what you can do is there's a macro out there, I can come in and open my data. Let's go open recent.

There's my image subset, same data set, I can come into my macros. And I have a macro right here that does the same thing runs through all the images, extracts all the bits and puts them into a CSV file that I can then bring into jmp really easily and I'll post That batch, macro as well on on the community later on this afternoon. So that's how I built those things. Now I'm not seeing any questions here. So we're going to go ahead and we will, we'll wrap up. One thing I do want to say is, again, the point of the primary point of today's content is that's how you analyze or navigate the Analyze menu. Next week, we're going to spend a little bit more time on that first part of the Analyze menu on the distribution platform and go over some of the things you may not have known are capable of doing you're capable of doing distribution platform.

Morning, everyone. It's a beautiful Monday here in New York, the sun shining finally. And today we're going to be going over another of the things that I wish I had known when I started learning about jmp. Also something that I like to tell a lot of people when they're first getting started using jmp, I've been working with a lot of people that have been taking the demo downloads that we have lately. And this is one of the things I tell them on the first day. Because this can be a little bit confusing coming from different software's because the way jmp considers the Analyze menu is organized entirely differently than the way you would expect coming over from somebody else's software. As always, I'm going to be taking your questions. So if you've got any questions, find that q&a window and at the end of the session, I'll try and answer as many of them as I can. I got one cut that came in over the weekend from actually from the first episode of this series, where they wanted to know how I did something and I'll be going through that as well, too. To show them how we did that, and we're going to dust off some fun and go into image analysis to do it. Alright, so first thing, the Analyze menu. When I first started learning the Analyze menu, it really didn't make sense to me until and it actually didn't make sense to me until I started teaching other people to use the software. And then as I started kind of refining my idea of what the the analytical workflow is, a lot of the ideas that we talked about in this series came from those early experiments and how to teach jmp on my own. I started to understand how the Analyze menu is laid out and that there's a method to this apparent madness. And it's there's a, what I want to share with you today is a general guideline about how to kind of guess where those things are that you're looking for. Kind of moving your guessing around in the menu from just wild guess to scientific, wild guess And we'll we'll leave the the middle of the layout for the sake of the recording. So let's get started. Let's look at let's, what I've got right here is a screen capture of the Analyze menu right here. So I can use my little laser pointer and point around. And let's take a second let's step back. When you're handed a brand new data set. What's the first thing you should do? Well, if you were here, for the first episode of this series, you would know the first thing you should do is graph your data. Well, if we make zero assumptions about our data, if we make zero assumptions about shape, about relationship about anything, the first type of graph that you probably would run into is something called a histogram. And it makes no assumptions. It's great for understanding how data is shaped looking for outliers doing basic statistical tests. There's lots of things you can do with just a histogram. If we come under the Analyze menu, the first thing we run into is this platform called the distribution. And that graph that goes with that distribution. Remember from the second thing, we talked about the the jmp workflow, get a graph, look at the graph, ask questions about what you see and then push the answer button. The graph that comes with the annal with distribution platform, is a histogram. So that's if we want to just look at one thing now if you look at your data, and then you moved a little bit further down the kind of the, the chain of logic that you might be following or that the inquiry that you might be looking at, you might start proposing relationships. I think a relates to be I think there's a correlation between A and B. Those are all possible things. And those by various relationships, we're starting to make inferences about whether or not something is a response or a factor whether or not it's an X or Y. In that case, we would come down to fit y by x. And there because of some, some clever coding, a lot of different plot, a lot of different comparisons are available all of them comparing one variable to another, most of the time making an assumption that one of them is response, and one of them is a factor. So it would seem at first glance, at first glance, that the Analyze menu is organized by the number of variables you're trying to compare. But then we get to tabulate and then we get to text Explorer, and that that logic kind of breaks down. So let's let's skip those two for just a minute. And let's move down to fit model and what we're doing Here as we're proposing more complex relationships, it's multivariate Yes, but we're composing, where we're proposing different levels of complexity in our models. And that's the trick. That's the key. The jmp analyze menu, at least, from what I've seen, and the way that I, the way that I teach it is that it's organized as you go down the menu. You're asking more and more complex more and more specialized problems. Which is why down here at the bottom, we have people consumer research, because let's be honest, people are generally kind of nutty, in very difficult to predict things. And because of that the data associated with people is invariably very complex and very hard to understand. So we've got some very specialized tools in that platform down there in that subgroup down there at the bottom, to help us navigate those complexities that come with the human condition. And so let's look at look at this a little bit down in the middle here. Let me bring up a little bit of an antic annotated version of this. So we talked about one variable, two variables, fit models regression. In here we have machine learning and AI type models. We also have multivariate platform, screening is kind of the odd man out, except it under the hood, a few of the screening platforms use machine learning technology so that that makes some sense as to why they're there. Then we get into prop quality and process. So that's a more specialized tool. And like I said, people down at the bottom, because it's incredibly complex data. Now, let's scoot back up to this transfer to the tabulate and text Explorer. And this is a problem in the scheme, because those don't necessarily make sense for them to be right there. Unless you understand that tabulate one of the roles that tabulate has, is to transform data from one form to another. It's good for doing cross tabs, it's good for doing well not so much cross tabs, pivot tables and things like that. So its goal is to do summary to reshape data so that other parts of the software can take the inputs to transformative tool, text explores the same way. A little cue about the role of a platform in jmp is when we put explorer at the back of something. One of the goals of the tool, one of the goals of the platform is to transform data from something in this case text which is really tricky to work with, to something that we can use for fit model that we can use for doing sentiment analysis that we can use for doing all these different things. So those two bridging tools, those two those To platforms there are bridging their goal is to transform from repair to fairly simply, fairly simple, simplistic type one, not simplistic, but fairly simple type analyses to more complex multivariate analyses that come along. So let's take a quick let's take a second and do a pop quiz here. I'll leave that up delay to give everybody a cue, and you can follow along. Well, I guess I'm not gonna leave that. Let's start with a question. If you wanted to compare one variable to a target, where would you look in the Analyze menu? I'll give you a second to think about that. got an answer. Okay. one variable, that's the key. So we're going to look on where there is one variable, analyze distribution. What would we do right there? Well, let's just take some data. And let's take some data here that I've got this is just processed data that doesn't really doesn't matter. For the sake of the discussion, we'll just grab something. Click ok. We're in the distribution platform now awesome. And if I follow my workflow, I'm gonna look at my data I'm gonna say I, you know, as normal enough horseshoes and hand grenades. We could, we could look at that and more in depth if we wanted to. But we come under here for this option that says test mean and I, I'm going to propose that my hypothesized mean is 30. And when I do that, I get down here at the bottom, my T test for the me. Okay. So that's part that's step one. Let's try the net. Let's try one more. Now, what if we wanted to get correlation coefficients? So we're comparing two variables. But we want to do it across all the continuous variables in a data set. Where would you go for that? Give me a second. Think about it again. Okay. If you said by variant, you could do that. You could do that in by various. It's possible. For come under by under or excuse me, if you if you say fit y by x are saying I'm going to compare two variables. Yes, you're comparing two variables, but you're comparing two variables and then two variables, and then two variables, and then two variables, and then you've got all these common tutorials. combinations of two variables. So while you can do this, you know, let's just grab some, let's grab a bunch of continuous variables here. Again, we're not really worried about what's going on here. Let's just due to, you could come into by very, very easily, and come under the red triangle, and then come down to the density ellipse. And that will give you the correlation. I could do it there. That's true. But then you'd have to do it for F pairwise for every combination in this data set. If you've got 150 different combinations of 150 different continuous variables, you're going to be at that all day. That's not what we want to do. So we come a little deeper, our problem is more complex. We're coming down further in the in the model, do we need to transform our data? Now we don't do we need a model. That doesn't sound quite right. modeling, modeling, screening, possibly, but multivariate methods, multivariate might be the right place to look for this. And again, there there may be if you're just starting out with jmp, there may be some, some hunting and pecking some a little bit of guessing around. But the purpose of this little trick is to kind of give you an idea about where you need to guess. So if I dump all my continuous variables in here, Wallah, I get all of my correlations. And I can even if I want to be fancy, I can put them on this scatterplot matrix in here. Just like that, and they're colored so it's easier to look at what's what's Important. Okay. All right. And that's that's the trick about the that's the trick for today. That's the thing we want to cover for today. And if you're interested, this particular graphic, it's up on the blog article that comes with this. It's look at the blog article for this, for this content today is linked in the jmp on the air segment for this. So you can get to this article, you can get to that little graphic if you need some help at any time. All right. Now I want to come to and while I'm doing this question, if you have any questions, I've got the q&a open here. Go ahead and type them in the q&a. Instead of taking a two minute break. I'm going to cover a question that I got earlier in the week from someone that wanted to know how I did something in the first episode. But in the meantime, if you've got questions, feel free to type them in to that to the q&a and I'll keep an eye on that while I'm going. All right. The question I got was how Did I make this 3d graphic? And it wasn't the specifics about the 3d graphic itself? That's just scatterplot. 3d, that's easy to do. The question was, how did I get the data? out of my images? Which is a very interesting question. And the short answer is that on Windows, this is really easy to do. So let me pull over my windows enter in my password here before I pull it over. Now what the problem is, so here's a here's jmp on Windows. On Mac, the workflows a little bit harder because of a there's some differences in in the underlying code. But the the trick the question here is how do I get Get if I have a series of when I'm opening if I have a series of images or TIFF images and I want to bring those into jmp, let me just open up all my show all my files here. And in this case I do I have my images set in here and what's called a multi file TIFF. So in this TIFF just like in a GIF, you know what, let's leave that let's leave the argument about GIF or GIF off the side for the moment. In this single file, there are multiple images, it's a movie more or less. This is really handy for for taking thicknesses of so if you're doing X ray tomography, you can set up the image stack in here and it will play through in certain graphic softwares or it's a great way just to keep everything together when you're doing image analysis. But the question is how do we get this data in? Well, in either case, john Ponty a few years back made a great tool called the image analyzer. And this on the Mac, or on the PC side, sorry, this, this makes this job really easy to do. So I'm going to navigate, I'm going to go choose I'm going to grab my TIFF file here. And it's going to read all the bits attached to this image, spit it out as a data frame with 2 million rows, but also create some images so you can kind of see what's going on inside of this file. So on Windows easiest pi, go grab the image analyzer, it's super easy to do. on the Mac, you would have to do this the image analyzer add in is available but on the Mac It only pulls in the first image, it won't pull in a multi image stack. So you'll have to bring these in one at a time using that image analyzer and then concatenate them. Or if you're into the if you're in the process of doing image analysis, you probably know about a tool called image J. And this is what this is the way I did it. And what you can do is there's a macro out there, I can come in and open my data. Let's go open recent. There's my image subset, same data set, I can come into my macros. And I have a macro right here that does the same thing runs through all the images, extracts all the bits and puts them into a CSV file that I can then bring into jmp really easily and I'll post That batch, macro as well on on the community later on this afternoon. So that's how I built those things. Now I'm not seeing any questions here. So we're going to go ahead and we will, we'll wrap up. One thing I do want to say is, again, the point of the primary point of today's content is that's how you analyze or navigate the Analyze menu. Next week, we're going to spend a little bit more time on that first part of the Analyze menu on the distribution platform and go over some of the things you may not have known are capable of doing you're capable of doing distribution platform.