Choose Language Hide Translation Bar

Let's talk tables (2020-US-45MP-549)

Mandy Chambers, JMP Principal Test Engineer, SAS
Kelci Miclaus, Senior Manager Advanced Analytics R&D, JMP Life Sciences, SAS, JMP LIfe Sciences

 

JMP has many ways to join data tables. Using traditional Join you can easily join two tables together. JMP Query Builder enhances the ability to join, providing a rich interface allowing additional options, including inner and outer joins, combining more than two tables and adding new columns, customizations and filtering. In JMP 13, virtual joins for data tables were developed that enable you to use common keys to link multiple tables without using the time and memory necessary to create a joined (denormalized) copy of your data. Virtually joining tables gives a table access to columns from the linked tables for easy data exploration. In JMP 14 and JMP 15, new capabilities were added to allow linked tables to communicate with row state synchronization. Column options allow you to set up a link reference table to listen and/or dispatch row state changes among virtually joined tables. This feature provides an incredibly powerful data exploration interface that avoids unnecessary table manipulations or data duplications. Additionally, there are now selections to use shorter column names, auto-open your tables and a way to go a step further, using a Link & ID and Link & Reference on the same column to virtually “pass through” tables. This presentation will highlight the new features in JMP with examples using human resources data followed by a practical application of these features as implemented in JMP Clinical. We will create a review of multiple metrics on patients in a clinical trial that are virtually linked to a subject demographic table and show how a data filter on the Link ID table enables global filtering throughout all the linked clinical metric (adverse events, labs, etc.) tables.

 

 

Auto-generated transcript...

 

Speaker

Transcript

Mandy Okay, welcome to our discussion today. Let's Talk Tables. My name is Mandy Chambers and I'm a principal test engineer on the JMP testing team.
And my coworker and friend joining me today is Kelci Miclaus. She's a senior manager in R&D on the JMP life sciences team.
Kelci and I actually began working together a few years ago as a Clinical product was starting to be a great consumer of all the things that I happen to test.
So I got to know her group pretty well and got to work with them closely on different things that they were trying to implement.
And it was really valuable for me to be able to see a live application that a customer would really be using the things that I actually tested in JMP and how they would put them to use them in the clinical product.
So in the past, we've done this presentation. It's much longer and we decided that the best thing to do here was to
give you the entire document. So that's what's attached with this recording, along with two sets of data and zip files. You should have data tables, scripts,
some journals and different things you need and be able to step through each of the applications, even if I end up not showing you or Kecli doesn't show you something that's in there. You should be able to do that. So let me begin by sharing my screen here.
So that you can see what what I'm going to talk about today. So as I said, the, the journal that I had, if I were going to show this in its entirety, would be talking about joining tables and the different ways that you can join tables.
And so this is the part that I'm not going to go into great detail on but just a basic table join. If I click on this,
laptop runs and laptop subjects. And under the tables menu, if you're new to JMP or maybe haven't done this before,
you can do a table join and this is a for physical join. This will put the tables together. So I would be joining laptop runs to laptops subjects.
Within this dialogue, you select the things that you want to join together. You can join by matching, Cartesian join, row join and then you would join the table. I'm not going to do that right now, just for
time consumption but that's that's what you would do. And also in here under the tables menu, something else that I would talk about would be JMP query builder.
And this has the ability to be able to join more tables together. It will, if you have 3, 4, 5, 6 however many tables you have, you can put them together and we'll make up one table that contains everything. But again, I'm actually not going to do that today.
So if I
go back into here and I close these tables.
Let's get started with how virtual join came about. So let's talk about joining tables first.
You have to decide what type of join you want to use. So your...if you're tables are small, it might be easiest to do a physical join. To just do a tables join, like the two tables I showed you weren't very big.
If you pull in three or four maybe more tables, JMP query builder is a wonderful tool for building a table.
And you may want all of your data in the same table so that may be exactly what you want. You just need to be mindful of disk space
and performance, and just understand if you have five or six tables that you have sitting separately and then you join them together physically, you're making duplicate copies.
So those are the ways that you might determine which which you would use. Virtual join came about in JMP 13 and it was added with the ability to
take a link, a common link ID, and join multiple tables together. It's kind of a concept of joining without joining. It saves space and it also saves duplication of data.
And so that...in 13 we we started with that. And then in 14 to 15, we added more features, things that customers requested. Link tables with rows synchronize...rows states synchronization. You can shorten column names. We added being able to auto open linked tables.
Being able to have a link ID and a link reference on the same column. And we also added these little hover tips that I'll show you where it can tell you which source is your column source table.
So those are the things that we added and I'm going to try to set this up and demonstrate it for you. So I've got this data that I actually got from a...
it's just an imaginary high-tech firm. And it's it's HR data and it includes things such as compensation, and headcount, and some diversity, and compliance,
education history, and other employment factors. And so if you think about it, it's a perfect kind of data to link because you have
usually a unique ID variable, such as an employee ID or something that you can link together and maybe have various data for your HR team that's in different places.
So I'm going to open up these two tables and just simply walk through what you would do if you were trying to link these together.
So this table here is Employee Scores 1 and then I have Compensation Master 1 in the back. These tables, Employees Scores 1 is my source table.
And Compensation Master is my referencing table. So you can see these ID, this ID variable here in this table. And it's also in the compensation master table.
So I'm going to set up my link ID. So it's a couple of different ways to do this. You can go into column properties.
And you can see down here, you have a link ID and reference. The easiest way to do this is with a right click, so there's link ID.
And if I look right here, I can see this ID key has been assigned to that column.
So then I'm going to go into my compensation master table. And I'm going to go into this column. And again, you can do it with column properties.
But you can do the easiest way by going right here to link reference, the table has the ID. So it shows up in this list. I'm going to click on this and voila, there's my link reference
icon right there. And I can now see that all the columns that were in this table are...are available to me in this table. You can see you have a large number of columns.
You can also see in here that you have...they're kind of long column names, you have the column names, plus this identifier right here which is showing you that this is a referencing column.
And so I'm going to run this little simple tabulate I've saved here and just show you very briefly that this is a report and just to simply show you this is a virtual column length of service.
And then compensation type is actually part of my compensation table and then gender is a virtual column. So I'm using...building this using virtual columns and also columns that reside in the table.
One thing I wanted to point out to you very quickly is that under this little red triangle...let's say you're working with this data and you decide,
"Oh, I really want to make this one table. I really want all the columns in one table." There is a little secret tool here called merge reference data.
And a lot of people don't know this is there, exactly. But if I wanted to, I could click that and I can merge all the columns into
this table. And so, but for time sake, I'm not going to do that right now, but I wanted to point out where that is located.
And let me just show you back here in the journal, real quickly. This is possible to do with scripting, so you can set the property link reference and point to your table and
list that to use the link columns. So I'm going to close this real quickly and then go back to the same application where I actually had
same two tables that I've got some extra saved scripts in here, a couple more things I want to show. So again, I've got employee scores. This is my source table.
And then I've got compensation master and they're already linked and you can see this here. So I want to rerun that tabulate and I want to show you something.
So you can see that these column names are shorter now. So I want to show what we added in JMP 14. If I right click and bring up the column info dialog,
I can see here that it says use linked column names right here. And that sets that these these names will be shorter
And that's really a nice feature because when, at the end of the day, when you share this report with someone, they don't really care where the columns are
coming from, whether they're in the main table or virtual table. So it's a nice, clean report for you to have. The script is saved so that you can see in the script that it's...
it saves the script that shows you a referencing table. So if I look at this, I can see. So you would know where this column is coming from but
somebody you're sharing with doesn't necessarily need to know. So I want to show you this other thing that that that we added with this dispatching of row states.
Real quick example, I'm going to run this distribution. And you notice right away that in this distribution, I've got a title that says these numbers are wrong.
And so let me point out what I'm talking about. Employee scores is my employee database table and it has about 3,600 employees.
This is a unique reference to employees and it's a current employee database, let's say. My compensation master table is more like a history table and it has 12,000
rows in it, so it has potentially in this table, multiple references to the same employee, let's say, an employee changed jobs and they got a raise, or they moved around.
Or it could have employees that are no longer in the company. So running this report from this table
doesn't render the information that I really want. I can see down here that my count is off, got bigger counts, I don't exactly have what I was looking for.
So this is one of the reasons why we created this row states synchronization and Kelci is going to talk a little bit more about this in a real life application, too.
But I'm just simply going to show you this is how you would set up dispatching row states. So what I'm doing is I'm just batching,
selection color marker. And what I'm doing is I'm actually sending from compensation master to employee scores,
I'm sending the information to this table because (I'm sorry), this is the table that I want my information to be run from.
So if I go back and I rerun that distribution, I now have this distribution (it's a little bit different columns), but I have this distribution. And if I look at the numbers right here,
I have the exact numbers of my employee database. And that's exactly what I wanted to see. So you need to be careful with dispatching and accepting and Kelci will speak more to that. But that was just a simple case example of how you would do that.
And I will show you real quickly, that there is a Online Help link that shows
an example of virtually joining columns and showing row states. It'll step you through that. There's some other examples out here too
of using virtual join. If you need more information about setting this up. And again, just to remind you, all of this is scriptable. So you can script this right here, by setting up your row states and the different things that you want with that.
So as we moved into JMP 15 we added a couple more things. And so what we added was we we added the ability to auto open a table and also to hover over columns and figure out where they're coming from. And I'll explain what that what that means exactly.
So if I click on these. We created some new tables for JMP 15, employeemaster.jmp, which is still part of this HR data. And so if I track this down a little bit and look, a couple things I'll point out about this table.
It has a link ID and a link reference. And that was the other thing we added to to JMP 15, the ability to be able to have a link ID and link reference on the same column.
So if I look at this and I go and look at my home window here, I can see that there's two more tables that are open. They were opened automatically for me.
And so I'm going to open these up because I kind of want to string them out so you can see how this works. But this employee master table is linked to
a...stack them on top of each other...it's linked to the education history table, which has been, in turn, linked to my predicted termination table. And you can see
there's an employee ID that has a link reference and the link ID, employee ID here. Same thing, and then predict determination has an ID only.
And if you had another table or two that had employee ID unique data and you needed to pull it in, you could continue the string on by assigning a link reference here and you can keep on keep on going. So I'm...just to show you quickly, if I right click and look at this column here,
I can see that my link ID is set, I can also see my link reference is set. And it tells me education history is a table that this table is linked to.
I've got it on auto open and I've got on the shorter names. I'm not dispatching row states, so nothing is set there. So all of the columns that are in these other two tables are available to me, for my referencing table here called employee master.
And real quickly, you can see that you have a large number of columns in here that are available to you, and the link columns show up as grouped columns down here.
So another question that got asked from customers, as they say, is there any way you can tell us where these columns come from
so that is a little clearer? So we added this nice little hover tip. If I hover over
this, this tells me that this particular column disability flag is coming from predicted termination. So it's actually coming from the table that's last in my series.
And if I go down here and I click on one of these, it says the degree program code is coming from education history.
So that's, that's a nice little feature that will kind of help you as you're picking out your columns, maybe in what you're trying to run with platforms and so forth.
But if I run this distribution, this is just a simple distribution example that's showing that employee level
is actually coming from my employee master table. This degree description is coming from education history table and this performance eval is coming from my predictive termination table. And then you can look some more with
some of these other examples that are in here. I did build
a context window of dashboards here that shows a Graph Builder showing a box plot. We have a distribution in here, a tabulate and a heat map,
using all virtual columns, some, you know, some columns that are from the table, but also virtual columns got a filter. So if I want to look at females and look at professionals.
I always like to point out the the oddities here. So if I go in here and look at these two little places that are kind of hanging out here. This is very interesting to me because comp ratios shows how people are paid.
Basically, whether they're paid in in the right ratio or not it for their job description. And it looks like these two outliers are
consistently exceeding expectations, that looks like they're maybe underpaid. So just like this one up here
is all by itself and it looks like they seldom meet their expectations, but they may be slightly overpaid and or they could be mistakes. But at any rate, as you zero in on those, you can also see that the selections are being made
here. So, in this heat map, I can tell that there is some performance money that's being spent and training dollars. so maybe train that person. So that's actually good good good to see
So that is
about all I wanted to show. I did want to show this one thing, just to remind, just to reiterate. Education history has access to the columns that are in predicted termination.
And so those two tables can talk to each other separately. And if I run this graph script,
I have similar performance and training dollars, but I'm looking at like grade point average, class rank, as to where people fall into the limits here using combinations of columns from just those two tables.
So I'm going to pass this on. I believe that was the majority of what I wanted to share. I'm going to stop sharing my screen. And I will pass this back to Kelci and she will take it from here.
Kelci J. Miclaus Thanks, Mandy.
Mandy said we've given this talk now a couple times and, really it was this combined effort of
me working in my group, which is life sciences for the JMP Clinical and JMP Genomics vertical solutions,
and finding such perfect examples of where I could really leverage virtual joins and working closely with the development team on how those features were released in the last few versions of JMP.
And so for this section I will go through some of the examples, specific to our clinical research and how we've really leveraged this talking table idea around row state synchronization.
So as as Mandy mentioned this is now, and if we have time towards the end, this, this idea of virtual joins with row state synchronization is now the entire architecture that drives how JMP Clinical reports and reviews are used
for assessing early efficacy and safety and clinical trials reports with our customers. And one of the reasons it fits so well is because of the formatting of typical clinical trial data.
So the data example that I'm going to use for all of the examples I have around row state synchronization or row state propagation as I sometimes call it,
are example data from a clinical trial that has about 900 patients. It was a real clinical trial carried out about 20-30 years ago
looking at subarachnoid hemorrhage and treatment of nicardipine on these patients.
The great thing about clinical data is we work with very standard normalized data structures, meaning that each component of a clinical trial is collected, similar to the HR data that Mandy showed...show...showed us
is normalized, so that each table has its own content and we can track that separately, but then use virtual joins to create comprehensive stories.
So the three data sets I'll walk through are this demography table which has about a little under 900 patients
of clinical trials, where here we have one row per patient in our clinical trial. And this is called the demography, that will have information about their birth, age, sex, race, what treatment they were given, any certain flags of occurrences that happened to them during the clinical trial.
Similarly, we can have separate tables. So in a clinical trial, they're typically collecting at each visit what adverse events have happened to a patient while on on a new
drug or study. And so this is a table that has about 5,000 records. We still have this unique subject identifier, but we have duplications, of course. So this records every event or adverse event
that was reported for each of the patients in our clinical trial. And finally I'll also use a laboratory data set or labs data set,
which also follows the similar type of record stacked format that we saw on the adverse events.
Here we're thinking of the regular visits, where they take several laboratory measurements and we can track those across the course of the clinical trial to look for abnormalities and things like that.
So these three tables are very a standard normalized format of what's called the international CDISC standard for clinical trial data.
And it suits us so well towards using the virtual join. Aas Mandy has said, it is easy to, you know, create a merge table of labs.
But here we have 6,000 records of labs and merging in our demography, it would cause a duplication of all of their single instances of their demographic descriptions. And so we want to set up a virtual join with this, which we can do really easily.
If we create in our demography table, we're going to set up unique subject identifier as our link ID.
And then very quickly, because we typically would want to look at laboratory results and use something like the treatment group they are on to see if there's differences in the laboratories,
we can now reference that data and create visualizations or reports that will actually assess and look at treatment group differences in our laboratory results.
And so we didn't have to make the merge. We just gained access to these...this planned arm column from our demography table through that simple two-step setting up the column properties of a virtual join.
It's also very easy to then look at like lab abnormalities. So here's a plot by each of the different arms or treatment groups who had abnormally high lab tests across visits in a clinical trial.
We might also want to do this same type of analysis with our adverse event, which we would also want to see if there's different occurrences in the adverse events
between those treatment groups. So once again we can also link this table to our referenced demography and very quickly create counts of the distribution of adverse events that occur separately for, say, a nicardipine, the active treatment, versus a placebo.
So now we want them to really talk. And so the next two examples that I want to show with these data are the row state synchronization options we have.
So you quickly saw from Mandy's portion that she showed that on the column properties we have the ability to synchronize row states now between tables.
Which is really why our talk is called talking tables, because that's the way they can communicate now. And you can either dispatch row states, meaning the table that you're set up the reference to some link ID can send information from that table back to its reference ID table.
And I'll walk through a quick example, but as mentioned...as Mandy mentioned, this is by far the more
dangerous case sometimes because it's very easy to hit times when you might get inconclusive results, but I'm going to show a case where it works and where it's useful.
As you've noticed, just from this analysis, say with the adverse events, it was very easy as the table that we set up a link reference to (the ID table) to gain access to the columns and look at the differences of the treatment groups in this table.
There's not really anything that goes the other way though. As Mandy had said, you wouldn't want to use this new join table to look at a distribution of, say, that treatment group,
because what you actually have here is numbers that don't match. It looks like there's 5,000 subjects when really, if you go back to our
demography table, we have less than 900. So here's that true distribution of about the 900 subjects by treatment group
with all their other distributions. Now, there is the time, though, that this table is what you want to use as your analysis table or the goal of where you're going to create an analysis.
And you want to gain information from those tables that are virtually linked to it. The laboratory, for example, and the adverse events.
So here we're going to actually use this table to create a visualization that will annotate these subjects in this table with anyone who had an abnormal lab test or a serious adverse event.
And now I've cheated, because I've prepared this data. You'll notice in my adverse events data
I've already done the analysis to find any case of subjects that were...any adverse events that were considered serious and I've used the row state marker to annotate those records that had...were a serious adverse event.
Similarly, in the labs data set,
I've used red color to annotate...annotate any of the lab results that were
abnormally high. So for example, we can see all of those that had high abnormalities. I've colored red most of this through, just row state selection and then controlling the row states.
So with this data where I have these two row states in place, we can go back to our demography table
and create a view
that is a distribution by site of the ages of our patients in a clinical trial.
And now if we go back to each of the linked tables, we can control bringing in this annotated information with row state synchronization.
So we're going to change this option here from row states with reference table to none, to actually to dispatch and in this case I want to be careful. The only thing I want this table to tell that link reference table is a marker set.
I'm going to click Apply
And you'll notice automatically my visualization that I created off that demography table now has the markers of any subjects who had experienced an adverse event from that other table.
We can do the same now with labs.
Choose to dispatch. In this case, we only want to dispatch color.
And now, just by controlling column properties, we're at a place where we have a visualization or an analysis built off our demography table that has gained access to the information from these virtually joined tables using the dispatch row state synchronization or propagation.
So that's really cool. I think it's a really powerful feature. But there are a lot of gotchas and things you should be careful with with the dispatch option.
Namely the entire way virtual joins work is
the link ID table, the date...the data table you set up, in this case demography, is one row per ID and you're using that to merge or virtually join into a
data table that has many copies of that usage ID. So we're making a one-to-many; that's fine. Dispatch makes a many-to-one
conversation. So in in the document we have an ...in the resource provided with this video, there's a lot of
commentary about carefully using this. It shouldn't be something that's highly interactive.
If you then decide to change row states, it can be very easy for this to get confusing or nonsensical,
that, say if I've marked both with color and marker, it wouldn't know what to do because it was some rows might be saying, "Color this red,"
but the other linked table might be saying color it blue or black. So you have to be very careful about not mixing and matching and not being too interactive with with that many-to-one merge idea.
But in this example, this was a really, really valuable tool that would have required quite a lot of data manipulation to get to this point.
So I'm going to close down these examples of the dispatch virtual join example and move on to likely what's going to be more commonly used is the accept...
acceptance row state of the virtual join talking tables. And for this case, I'm actually going to go through this with a script. So instead of interactively walking me through the virtual join and
row state column properties, we're going to look at this scripting results of that.
And the example here, what we wanted to do, is be able to use these three tables (again, the demography, adverse events and laboratory data in a clinical trial) to really create what they call a
comprehensive safety profile. And this is really the justification and rationale of our use in JMP Clinical for our customers.
This idea that we want to be able to take these data sets, keep them separate but allow them to be used in a comprehensive single analysis so they don't feel separate.
So with this example, we want to be able to open up our demography and set it up as a link ID. So this is similar to what I just did interactively that will create the demographic table and create the link ID column property on unique subject identifier.
So we're done there. You see the key there that shows that that's now the link ID property. We then want to open up the labs data set.
And we're going to set a property on the unique subject identifier in that table to use the link reference to the demography table.
And a couple of the options and the options here. We want to show that that property of using shorter names. Use the linked column name to shorten the name of our
columns coming from the demography table into the labs table. And here we want to set up row state synchronization as an acceptance of select, exclude and hide.
And we're going to do this also for the AE table. So I'll run both of these next snippets of code,
which will open up my AE and my lab table. And now you'll see that instead of that dispatch the properties here are said to set to accept
with these select, exclude and hide. And similarly the adverse events table has the exact same acceptance. So in this case now, instead of this dispatch, which we were very careful to only dispatch
one type of row state from one table and another from another table back to our link ID reference table. Here we're going to let our link ID reference table demography broadcast what happens to it
to the other two tables And that's what accept does. So it's going to accept row states from the demography table.
And I've cheated a little bit that I actually just have a script attached to our demography table here
that is really just setting up some of the visualizations that I've already shown that are scripts attached to each of the table in a single window.
And so here we have what you could consider your safety profile. We have distributions of the patient demographic information. So this is sourced from the demography table.
You see the correct numbers of the counts of the 443 patients on placebo versus the 427 on nicardipine.