cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
JMP is taking Discovery online, April 16 and 18. Register today and join us for interactive sessions featuring popular presentation topics, networking, and discussions with the experts.
Choose Language Hide Translation Bar
marc1
Level III

Combining multiple datasheets into one mastersheet - Can "Table Join" create with creating multiple columns" ?

Hello -

 

I usually work with big datasets with multiple sheets provided in xls format.

I usually prework those sheets in excel with pivot tables and match functions to combine the multiple lists into into one mastersheet , so that every individual row represents one individual case with multiple variables in columns. 

A sample is displayed here - ID as unique identifier - weight - [Code 101, Code 102, ... are variables from another sheet]

sa2.jpg

 

Here is the data how it looks - unsorted (!) in the other sheet - a specific matching ID , but every specific variable (code) is presented in an extra row, potentially with same matching IDs.

 

 

sa1.jpg

What I want JMP to do, is to add those variables (in picture 2) to individual columns to the mastersheet (picture 1). It would automatically add the column names for the individual variable , and also add 1/0 if the code is present yes or no for the unique identifier (ID). The matching variable is ID.

 

When I use "Table Join" command and select match ID=ID , then I get as new output sheet individual columns with same IDs, which are not helpful for further analysis for me. I would need those multiple columns stacked behind the present set of variables in the mastersheet.

 

The problem gets more tricky, as not all of the unique IDs have listed codes (from sheet 2), so sometimes it may be just 0 for all. Or some have olnly 1 code, while others have multiple.

 

Thanks a lot for finding a solution for me ! Marc

 

1 ACCEPTED SOLUTION

Accepted Solutions
julian
Community Manager Community Manager

Re: Combining multiple datasheets into one mastersheet - Can "Table Join" create with creating multiple columns" ?

Hi @marc1,

I believe there's a straightforward way to do this without involving any scripting. Let's start with Table A and Table B (attached) that resemble your situation. tablea.pngtableb.png

What we're going to need is a table like the following to join with Table A:split.png

So how do we get there? What we'll be doing is using Tables > Split on Table B, using a version of the Code column as the Split By column, and using a Column of 1s as the Split Column. First, we need to make a version of the Code column that has "Code: " as text in the beginning (this will keep clear the column names we end up making, but isn't strictly necessary). Here's some JSL to add such a column to your table:

 

New Column("Code Text", Character, "Nominal", Formula("Code " || Char(:Code)));

 

This uses a column formula to prepend "Code: " to each code value. Now, we need a column that will be the values in the final table. I'll make a column called "Present" and set the values to all ones:

 

New Column("Present", Numeric, "Continuous", Format("Best", 12), Formula(1));

Table B looks like this now: 

tableb-2.pngNext, we "Split" this table to get it into a wide format with one row per ID. We'll use Tables > Split, and use :Code Text as the Split By column, :Present as the Split Columns, and :ID as the Group column. If you're using a Formula column (as I did) for the Split Column, be sure to uncheck "Copy Formula."split dialog.png

 

 

This returns: st.png

Which is nearly there -- we have blanks where the code was not present, so we need to replace those with 0s. We can do this by selecting all the Code columns, then using Cols > Standardize Attributes. Then, click Recode at the top, and change . to 0. Another option, which I find to be very fast and perfectly adequate here is using a simple Find and Replace -- Edit > Search > Find: Use nothing as the "find what" and 0 as the replace with, and check "Match entire cell value." Click Replace All, and you're done.find replacew.png

Now we have a table with the IDs, and the presence or absence of the codes. The last thing to do is bring those values into Table A. We can use Tables > Join, or Tables > Update. I'll use Tables > Update here: update.png

 

And there we go!final.png

 

Including the missing values for IDs I and J that I did not have in the Codes table (which means this method is good for noticing when you don't have data for certain items).  I hope this helps get you started in your situation. 

 

@julian 

View solution in original post

2 REPLIES 2
julian
Community Manager Community Manager

Re: Combining multiple datasheets into one mastersheet - Can "Table Join" create with creating multiple columns" ?

Hi @marc1,

I believe there's a straightforward way to do this without involving any scripting. Let's start with Table A and Table B (attached) that resemble your situation. tablea.pngtableb.png

What we're going to need is a table like the following to join with Table A:split.png

So how do we get there? What we'll be doing is using Tables > Split on Table B, using a version of the Code column as the Split By column, and using a Column of 1s as the Split Column. First, we need to make a version of the Code column that has "Code: " as text in the beginning (this will keep clear the column names we end up making, but isn't strictly necessary). Here's some JSL to add such a column to your table:

 

New Column("Code Text", Character, "Nominal", Formula("Code " || Char(:Code)));

 

This uses a column formula to prepend "Code: " to each code value. Now, we need a column that will be the values in the final table. I'll make a column called "Present" and set the values to all ones:

 

New Column("Present", Numeric, "Continuous", Format("Best", 12), Formula(1));

Table B looks like this now: 

tableb-2.pngNext, we "Split" this table to get it into a wide format with one row per ID. We'll use Tables > Split, and use :Code Text as the Split By column, :Present as the Split Columns, and :ID as the Group column. If you're using a Formula column (as I did) for the Split Column, be sure to uncheck "Copy Formula."split dialog.png

 

 

This returns: st.png

Which is nearly there -- we have blanks where the code was not present, so we need to replace those with 0s. We can do this by selecting all the Code columns, then using Cols > Standardize Attributes. Then, click Recode at the top, and change . to 0. Another option, which I find to be very fast and perfectly adequate here is using a simple Find and Replace -- Edit > Search > Find: Use nothing as the "find what" and 0 as the replace with, and check "Match entire cell value." Click Replace All, and you're done.find replacew.png

Now we have a table with the IDs, and the presence or absence of the codes. The last thing to do is bring those values into Table A. We can use Tables > Join, or Tables > Update. I'll use Tables > Update here: update.png

 

And there we go!final.png

 

Including the missing values for IDs I and J that I did not have in the Codes table (which means this method is good for noticing when you don't have data for certain items).  I hope this helps get you started in your situation. 

 

@julian 

marc1
Level III

Re: Combining multiple datasheets into one mastersheet - Can "Table Join" create with creating multiple columns" ?

Hello Julian, many thanks for this ! This works beautiful !