Subscribe Bookmark RSS Feed

How to Join and get common matching columns

a01

Community Trekker

Joined:

Nov 14, 2014

One of the things that I'm struggling with in JMP is the fact that, when I want to do a many-for-many table join using the "Join" function on matching columns, then the match columns in the resulting table always get/stay segregated between each of the two source tables.  JMP doesn't seem to provide an automatic or convenient way to get the common match column(s) with the values filled in for all rows regardless of which source table the key was present in.  It would be able to get such common fully-filled match column(s), with "presence" columns to indicate whether that key was present in each source table.

Let me illustrate with an example.  Suppose I have two tables, "ArtClass" and "GymClass".  The "primary key" in each table is the combination of "FirstName" and "LastInitial".

ArtClass:

FirstNameLastInitialFavorite colorScore
JohnSRed4
JohnBBlue5
AliceAGreen6
AliceMYellow7
BobSPurple6
SallyCBlue5

GymClass:

FirstNameLastInitialFavorite activityScore
JohnSMonkey bars8
JohnAShot put7
AlicePSoccer6
BobSBasketball5
SallyCField hockey5
BobMCrunches6

If I do a Join on Matching Columns, equating FirstName and LastInitial, without dropping multiples and including non-matches in both tables (an "outer join"), then I obtain the following, which we'll call "Students1" (Edit: Please forgive me if any of these tables are getting cut off the screen; maybe try copying this into a MS Word document because of how this webpage displays.):

Students1:    

FirstName of ArtClassLastInitial of ArtClassFavorite colorScore of ArtClassFirstName of GymClassLastInitial of GymClassFavorite activityScore of GymClass
JohnAShot put7
AlicePSoccer6
BobMCrunches6
JohnSRed4JohnSMonkey bars8
JohnBBlue5
AliceAGreen6
AliceMYellow7
BobSPurple6BobSBasketball5
SallyCBlue5SallyCField hockey5

What I would much rather get is the following kind of table, which we'll call "Students2":

Students2:

FirstNameLastInitialFavorite colorScore of ArtClassPresent in ArtClassFavorite activityScore of GymClassPresent in GymClass
AliceAGreen610
AliceMYellow710
AliceP0Soccer61
BobM0Crunches61
BobSPurple61Basketball51
JohnA0Shot put71
JohnBBlue510
JohnSRed41Monkey bars81
SallyCBlue51Field hockey51

Notice how in the second version, I have just one common set of the matching columns -- FirstName and LastInitial -- which is fully populated for all rows.  I also have columns to tell me if the key was present in each of the source tables (ArtClass and GymClass).  In contrast, in the first version (Students1), the only way you can tell if each key was present in each source table, is by seeing if the segregated key columns ("FirstName of ArtClass"/"LastInitial of ArtClass", "FirstName of GymClass"/"LastInitial of GymClass") are missing or present.

A way that I contrived the second version, Students2, was by doing the following roundabout procedure:

  1. Subset or Summary each of the two source tables (ArtClass and GymClass) by the match columns, "FirstName" and "LastInitial".
  2. Concatenate the two subset tables made in #1.
  3. Summary the table made in #2 by the match columns ("FirstName", "LastInitial") to remove duplicates and get the unique keys of the joined table; remove the "N Rows" column.
  4. In ArtClass, temporarily add a constant column "Present in ArtClass" set to all ones.
  5. In GymClass, temporarily add a constant column "Present in GymClass" set to all ones.
  6. In ArtClass, temporarily change "Score" to "Score of ArtClass" to disambiguate this column from the one in GymClass.
  7. In GymClass, temporarily change "Score" to "Score of GymClass" to disambiguate this column from the one in ArtClass.
  8. In the table made in #3, Update from each of the two temporarily-modified source tables (ArtClass and GymClass), matching on "FirstName" and "LastInitial" and adding all columns.  (2 separate operations)
  9. In the same table, select the columns "Present in ArtClass" and "Present in GymClass"; do Standardize Attributes and Recode the missing value to be 0 instead, so that the "presence" values are 0 or 1.  This is the desired result, Students2.

Note that, in this procedure for making Students2, I had to manually segregate the conflicting non-key column, "Score", which the Join function had originally done for me automatically when making Students1 (by changing each "Score" to "Score of..."), which is what I wanted it to do.  However, the problem with the Join function is that it does the same thing for the key columns, which I do not want it to do -- I want the key columns to be common for the entire joined table, with separate columns to flag the presence in each source table.  My procedure above is not scalable to situations where you might have many more conflicting non-key columns for your data.

The only alternative method I've realized, is to use the Join function as before, but to manually add a formula column for every single match column that would combine the segregated pair of that match column (e.g. "FirstName" would be a formula column on "FirstName of ArtClass" and "FirstName of GymClass"), and also add formula columns for presence in each source table that would have to test if all of the segregated match columns corresponding to each source table are missing or not (for example, "Present in ArtClass" would be a forumula of whether both "FirstName of ArtClass" and "LastInitial of ArtClass" are missing).  This is also a very tedious, roundabout process.

I'm wondering if there's a better, efficient solution for what I'm trying to do -- i.e. to generate Students2 in my example?  Would it be with a script or add-in of some kind?

In addition, would SAS potentially consider something like this as a future feature extension for JMP?

Appreciate any assistance with this.

3 REPLIES
Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

There's a "Merge same name columns" option in the Join dialog that will do as its name suggests and get you one set of First Name and Last Initial columns.

In your example, you've got a column named Score in each table so you'll need to rename them to Gym Score and Art Score. After doing that you can complete the Join dialog like this:

8008_Join_and_Art_Class_and_Gym_Class.png

And you get a data table like this:

8010_untitled_13_and_Art_Class.png

The Match Flag column indicates which tables contributed to each row. 1==Main Table, 2==With Table, 3 == Both.

I think that's what you're looking for, or awfully close anyway.

-Jeff

-Jeff
a01

Community Trekker

Joined:

Nov 14, 2014

Ah, excellent!  (Thanks for the fast response.)  I guess that was right under my nose in a way, and I can't quite remember if I might have tried that checkbox a long time ago and forgotten about what it does.  I'd say that solves the majority of this problem, because it's pretty simple to Recode "Match Flag" (e.g. into "ArtClass"/"GymClass"/"Both") or split it up into two separate "presence" columns if desired (like I had in Students2).

I suppose the only caveat here, however, is that there could be a lot of manual column renaming required if there are many analogous non-key columns with common names like "Score" in my example.  It looks like if you don't rename the common non-key column(s), then apparently those column(s) in the resulting joined table seem to take the value arbitrarily from one source table or the other.  (Perhaps there is some internal method as to which one "wins".)

In this way, the "Join" utility is giving us a choice between either:

     (a) disambiguating everything automatically which we want but segregating the key columns which we don't want, when we don't select "Merge same column names", as in generating Students1 in my post, or

     (b) commonly combining the match columns and giving a presence-indicating column, which we want, but letting non-key columns with common names conflict with each other, which we don't want -- when we do select "Merge same column names".

Do you know of a good workaround to this caveat?

Otherwise, thanks for pointing this out!

Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

I think you've summarized the behaviors and limitations pretty well.

I'll pass them along to our development staff to see if we can make some improvements in a future release.

-Jeff

-Jeff