Subscribe Bookmark RSS Feed

How do I join two data tables without losing data?

ShannonK

Occasional Contributor

Joined:

Nov 3, 2016

I am trying to Join two tables. I am using three common Source Columns as matches. When I get the new output table, it is missing data. I have selected 'Merge same name columns' and 'Include non-matches' before creating the new table. The three Source Columns appear to be exact matches in the rows that have missing data. What would cause me to lose data when creating the new table? I am not scripting and would prefer an interactive explanation referring to the menu items and GUI windows.

Thanks for any tips.

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson

Super User

Joined:

Jun 22, 2012

Solution
that would clearly cause a problem
Jim
11 REPLIES
txnelson

Super User

Joined:

Jun 22, 2012

are you specifying to "Include Non Matches" for both the Main Table and the With Table?

Also, sometimes the matching columns can have an issue when there are preceding blanks in the data.
Jim
markschahl

Community Trekker

Joined:

Jun 18, 2012

I get this and other table manipulation questions a lot from my users. So much so, that I added a "How to" page on this to my internal JMP Sharepoint site. Here are some helpful videos/links:

 

Splitting Columns

Joining Tables

Blog Post on Stacking

Stacking Columns

 

One thing that i have learned: if the dialog box has a checkbox for "keep dialog open", check it - you likely get it wrong the first time...

 

 

ShannonK

Occasional Contributor

Joined:

Nov 3, 2016

Thanks for the idea Jim. I did specify "Include Non Matches" for both tables. The columns that I am using to match the tables don't have blanks in them, but the data in their rows do. Does that matter? I considered that as I looked at which rows had complete data and which ones had missing data and I can't sort any logic out based on the blanks.

 

Here is some more information about my issue. I have included a screen shot that shows three tables. On top is the Main table, next is the With table, and finally the output table. The top row of each table is for case 2001-2-14F (YEAR-PSU-CASEID which I am using to assign my match between tables). This case exists in the Main and With table, but doesn’t get matched for the output table and leaves the ‘EVENTS’ column blank. Any thoughts on why this isn't working the way I expect it to?

 

Thank you!

 

ShannonK

Occasional Contributor

Joined:

Nov 3, 2016

Screenshot.jpg

Here is the screen shot.

XanGregg

Staff

Joined:

Jun 23, 2011

Thanks for the screenshot. Very helpful. Can you also include an screenshot of the dialog or the text of the Joined "Source" script? Maybe there's a problem with the match specification.

 

Another diagnostic is to turn on the Match Flag option to create a new column in the output that indicates whether the row comes from one table or both.

 

In you image, it looks like the first Joined table row only matches the Main table and not the With table, which results in the fields from the With table being assigned missing values.

 

Since there are "PSU of ..." columns present it looks like the "Merge same name columns" option wasn't set. 

ShannonK

Occasional Contributor

Joined:

Nov 3, 2016

Screenshot2.jpg

This is a screenshot of the Join dialogue that I have been trying. For the earlier screenshot, I chose YEAR, PSU, CASEID to be output from both original tables by selecting them for the 'Output Columns', which I haven't shown here. I have also done a Match Flag output in the past and it shows that it is getting the data from only one table in lots of cases (flags 1 or 2). It confirms that there is no match happening, but I don't know why it's not finding a match despite the data being in both original tables.

txnelson

Super User

Joined:

Jun 22, 2012

I still want to focus on the values. Your Year column, is it a JMP Date value, formatted to just show the year portion of the date, or is it an integer?

What are the specifics about the CASEID column. It is being displayed as a character column. Is this the case or is it a numeric column being displayed using a Hex format?
If it is character, there can be blanks or unprintable characters in the string that might be causing the mismatches. If you use the match flag, it is fairly easy to check this out when you find the row with the match flag of 1 and compare it to the row from the match flag 2 row that should have matched with it.

Jim
ShannonK

Occasional Contributor

Joined:

Nov 3, 2016

The Year column is an integer, not a date.

In looking into your question about CASEID, it appears that it is of data type Character. I see now that sometimes the CASEID has a space before the number. Do you think that is the issue? I'll work on removing the blank spaces to see if this fixes my join issues. Thanks for your help!

txnelson

Super User

Joined:

Jun 22, 2012

Solution
that would clearly cause a problem
Jim