Exploring Text Explorer

4 Kudos

Learn some of the ways you can work with your text data in JMP and JMP Pro.

Full Transcript (Automatically Generated)

Morning, or good afternoon, everyone. My name is Mike Anderson. I'm a systems engineer for New York. And today I'm going to be showing you a few of the things that I love about text Explorer. I got really passionate about this particular tool back when it was when it was first introduced. And it's it's been kind of a hobby for me to kind of get go around with it and play with the play with text explore a little bit.

The for the sake of argument here, I'm going to use some data that I've never used before. I was working on this over the weekend trying to pull this data set together. And this is being stuck at home like everybody else. I've been going through some of my old favorite TV shows one of those back in grad school was Futurama. So I've pulled all of the the plots and plot summaries from Wikipedia for all of the Futurama episodes, and that's what we're going to be analyzing today. Now before we go into this in too much detail, let me give you a little bit of an idea about what text explores for and it all lines up in that last name in jmp, there are two platforms that are called explorers. There is text exploring, there's functional Data Explorer. These platforms, while they can do an analysis among themselves, they're one of their primary features is to transform data from one usually tricky format to analyze into something that's easier to work with in jmp.

So that said, Well, we can do some very interesting things in the text Explorer. One of the things that I love about it is its ability to transform text data into something that I can do sentiment analysis with. And I'll there's some resources on that I'm not going to go down that road. That's that's a whole nother rabbit hole to go down. But one of the things that make that I love about text Explorer is that facility to take text which is challenging to analyze in a in a statistical package to something that we can analyze that we can model We can crunch in a little bit more in a more comfortable setting. So let's go ahead and let's get started with this. And I'm going to show you first the how to get the platform set up. So we'll go into analyze text Explorer.

And it's as simple as dropping in the free text that you've got in your data set. So text explorer goes into we go into there first, we can we have a bunch of options that we can use to filter down the data, we can get rid of really, really short words, we can work with phrases and things like that as well. There's a great tutorial by Nick Shelton on how to do all of the basics that's out on the web. So I'm gonna skip over that just for a second and get to kind of the highlight reel. of the things I love about text explorer in general. The first thing that I love is the word cloud. Okay, this is when people talk about text analysis. This is the first thing that pops to mind. It can be an potful it can be incredibly powerful. It's a great tool for visualizing this data and it makes it kind of easy for people to consume. In this case. The first thing we see out of the gate we see the main characters we see Philip j fry we see Ben Bender, we see Leila, we see these main characters that are in almost every episode, so it's not surprising that they are going to that they're going to show up a lot in the plot summaries. Now removing that information is really easy. I can right click on something and say add a stop word and that excludes that word from analysis. If I do that with the just the by removing just these three, just fry bender and Lila

I get up to something where I can start picking out different themes that are present within the dataset. Now, I've colored this based on the overall Internet Movie Database rating for The Internet Movie Database rating for the for the individual episodes. And you can see that in some cases, things that involve Farnsworth or involve involve robots, you can see that those tend to be a little bit lower on the rating scales than things that involve the professor's or for instance, time travel or something like that some of the more kind of out there sci fi type topics that are associated with this with this show. And that brings me to the next thing that I love about this, that I love about this platform. And this is a pro capability. But what I love is the ability to come into after I've gotten rid of the words that don't necessarily matter. Once I've got into got the dataset curated, I can come in and say do something called Latent Semantic Analysis. And that's a lot of words to say, we're going to analyze all of this, all of these terms that we've that we've looked at, and turn them into things that maybe we can look at it from trends are phrases and words that tend to show up together. And that's what we've got down here.

Now in this case we don't see a lot, that's good, you dramas a show was kind of all over the place, there was no real formulaic element to it in terms of the different plot elements that it had into it. So expecting to see a lot of interesting things in here wasn't something I was looking for. What I was looking for curious about was how the topic analysis, which is one of the options for under SVD would be able to handle this kind of data. And what it does in this case is it starts picking out individual episodes, not by picking them out based on their on on the row of information but just based on the words and phrases that tend to show up together. It starts reconstructing the synopsis for different for different people. So it's, and that's kind of a good idea for what for what topics are in this case, they're not necessarily topics, they're words and phrases that show to show up together very frequently.

And you can use that with your own personal Insight with your subject matter expertise to read into that and start figuring out things that are that are important within this dataset. And let me show you one where we've got some common themes that tend to show up a lot within a data set. This is the same type of analysis. This is the this is a actually one of the sample data sets in jmp, and it is looking at survey of pets.

And what we can see here is we this, this top plot now actually has some structure to it has some things that are interesting to it. And the way I explain this to people is each one of these kind of tendrils that goes off along along each side. You can see there's three of them here. And you can see three One, two, maybe three, maybe four over here for different three or four different topics, or three or four different common sets of documents that show up within the data set. And this helps you when you're looking at data where you're looking for commonality when you're looking for common themes that show up within a data set like survey analysis, or when you're analyzing reviews from products or when you're looking at, you're looking at your social media. Those kind of common threads and themes can provide insight for you and we can come in and just grab them really quickly. Here I'm using the lasso tool. And I can pull up the text and I can look in here and I can see what's going on. Aside from the strange. The cat, the itinerant cat entry here, there's a couple of them. This thread is talking about guard dogs. And we can get that just from the the different terms that tend to show up together within the data set. And that's that's the crux of the things that I like about the crux of the things that I like about this, this platform, the ability to visualize a very challenging data set the word cloud, the ability to take in, dig into that data set for its for different themes or trends you may not know are there. And the third is its ability, as I said, first to transform data from one form to another to let us do things like sentiment analysis and things like that. The way we do that, in this case is we come up to the red triangle, and we come down to where it says save document term matrix. And what that will do is it will take all of the terms that we've taken in our data set and create columns of it within the original data set for each of those terms, and those become the factors that we use when we're building a model. For for instance, sentiment analysis. And this is a bit of a high This is a bit of a high level over View there are some great tools for working with sediment and working with with text explorer out there on the web. There's two mastering jmp seminars already recorded. You can go and look at those today. Those are going to be those links are going to be in the community page for this episode of jmp on air. Julian, I'll take it back to you now.

Morning, or good afternoon, everyone. My name is Mike Anderson. I'm a systems engineer for New York. And today I'm going to be showing you a few of the things that I love about text Explorer. I got really passionate about this particular tool back when it was when it was first introduced. And it's it's been kind of a hobby for me to kind of get go around with it and play with the play with text explore a little bit. The for the sake of argument here, I'm going to use some data that I've never used before. I was working on this over the weekend trying to pull this data set together. And this is being stuck at home like everybody else. I've been going through some of my old favorite TV shows one of those back in grad school was Futurama. So I've pulled all of the the plots and plot summaries from Wikipedia for all of the Futurama episodes, and that's what we're going to be analyzing today. Now before we go into this in too much detail, let me give you a little bit of an idea about what text explores for and it all lines up in that last name in jmp, there are two platforms that are called explorers. There is text exploring, there's functional Data Explorer. These platforms, while they can do an analysis among themselves, they're one of their primary features is to transform data from one usually tricky format to analyze into something that's easier to work with in jmp. So that said, Well, we can do some very interesting things in the text Explorer. One of the things that I love about it is its ability to transform text data into something that I can do sentiment analysis with. And I'll there's some resources on that I'm not going to go down that road. That's that's a whole nother rabbit hole to go down. But one of the things that make that I love about text Explorer is that facility to take text which is challenging to analyze in a in a statistical package to something that we can analyze that we can model We can crunch in a little bit more in a more comfortable setting. So let's go ahead and let's get started with this. And I'm going to show you first the how to get the platform set up. So we'll go into analyze text Explorer. And it's as simple as dropping in the free text that you've got in your data set. So text explorer goes into we go into there first, we can we have a bunch of options that we can use to filter down the data, we can get rid of really, really short words, we can work with phrases and things like that as well. There's a great tutorial by Nick Shelton on how to do all of the basics that's out on the web. So I'm gonna skip over that just for a second and get to kind of the highlight reel. of the things I love about text explorer in general. The first thing that I love is the word cloud. Okay, this is when people talk about text analysis. This is the first thing that pops to mind. It can be an potful it can be incredibly powerful. It's a great tool for visualizing this data and it makes it kind of easy for people to consume. In this case. The first thing we see out of the gate we see the main characters we see Philip j fry we see Ben Bender, we see Leila, we see these main characters that are in almost every episode, so it's not surprising that they are going to that they're going to show up a lot in the plot summaries. Now removing that information is really easy. I can right click on something and say add a stop word and that excludes that word from analysis. If I do that with the just the by removing just these three, just fry bender and Lila I get up to something where I can start picking out different themes that are present within the dataset. Now, I've colored this based on the overall Internet Movie Database rating for The Internet Movie Database rating for the for the individual episodes. And you can see that in some cases, things that involve Farnsworth or involve involve robots, you can see that those tend to be a little bit lower on the rating scales than things that involve the professor's or for instance, time travel or something like that some of the more kind of out there sci fi type topics that are associated with this with this show. And that brings me to the next thing that I love about this, that I love about this platform. And this is a pro capability. But what I love is the ability to come into after I've gotten rid of the words that don't necessarily matter. Once I've got into got the dataset curated, I can come in and say do something called Latent Semantic Analysis. And that's a lot of words to say, we're going to analyze all of this, all of these terms that we've that we've looked at, and turn them into things that maybe we can look at it from trends are phrases and words that tend to show up together. And that's what we've got down here. Now in this case we don't see a lot, that's good, you dramas a show was kind of all over the place, there was no real formulaic element to it in terms of the different plot elements that it had into it. So expecting to see a lot of interesting things in here wasn't something I was looking for. What I was looking for curious about was how the topic analysis, which is one of the options for under SVD would be able to handle this kind of data. And what it does in this case is it starts picking out individual episodes, not by picking them out based on their on on the row of information but just based on the words and phrases that tend to show up together. It starts reconstructing the synopsis for different for different people. So it's, and that's kind of a good idea for what for what topics are in this case, they're not necessarily topics, they're words and phrases that show to show up together very frequently. And you can use that with your own personal Insight with your subject matter expertise to read into that and start figuring out things that are that are important within this dataset. And let me show you one where we've got some common themes that tend to show up a lot within a data set. This is the same type of analysis. This is the this is a actually one of the sample data sets in jmp, and it is looking at survey of pets. And what we can see here is we this, this top plot now actually has some structure to it has some things that are interesting to it. And the way I explain this to people is each one of these kind of tendrils that goes off along along each side. You can see there's three of them here. And you can see three One, two, maybe three, maybe four over here for different three or four different topics, or three or four different common sets of documents that show up within the data set. And this helps you when you're looking at data where you're looking for commonality when you're looking for common themes that show up within a data set like survey analysis, or when you're analyzing reviews from products or when you're looking at, you're looking at your social media. Those kind of common threads and themes can provide insight for you and we can come in and just grab them really quickly. Here I'm using the lasso tool. And I can pull up the text and I can look in here and I can see what's going on. Aside from the strange. The cat, the itinerant cat entry here, there's a couple of them. This thread is talking about guard dogs. And we can get that just from the the different terms that tend to show up together within the data set. And that's that's the crux of the things that I like about the crux of the things that I like about this, this platform, the ability to visualize a very challenging data set the word cloud, the ability to take in, dig into that data set for its for different themes or trends you may not know are there. And the third is its ability, as I said, first to transform data from one form to another to let us do things like sentiment analysis and things like that. The way we do that, in this case is we come up to the red triangle, and we come down to where it says save document term matrix. And what that will do is it will take all of the terms that we've taken in our data set and create columns of it within the original data set for each of those terms, and those become the factors that we use when we're building a model. For for instance, sentiment analysis. And this is a bit of a high This is a bit of a high level over View there are some great tools for working with sediment and working with with text explorer out there on the web. There's two mastering jmp seminars already recorded. You can go and look at those today. Those are going to be those links are going to be in the community page for this episode of jmp on air. Julian, I'll take it back to you now.