One of the things that I'm struggling with in JMP is the fact that, when I want to do a many-for-many table join using the "Join" function on matching columns, then the match columns in the resulting table always get/stay segregated between each of the two source tables. JMP doesn't seem to provide an automatic or convenient way to get the common match column(s) with the values filled in for all rows regardless of which source table the key was present in. It would be able to get such common fully-filled match column(s), with "presence" columns to indicate whether that key was present in each source table.
Let me illustrate with an example. Suppose I have two tables, "ArtClass" and "GymClass". The "primary key" in each table is the combination of "FirstName" and "LastInitial".
If I do a Join on Matching Columns, equating FirstName and LastInitial, without dropping multiples and including non-matches in both tables (an "outer join"), then I obtain the following, which we'll call "Students1" (Edit: Please forgive me if any of these tables are getting cut off the screen; maybe try copying this into a MS Word document because of how this webpage displays.):
|FirstName of ArtClass||LastInitial of ArtClass||Favorite color||Score of ArtClass||FirstName of GymClass||LastInitial of GymClass||Favorite activity||Score of GymClass|
What I would much rather get is the following kind of table, which we'll call "Students2":
|FirstName||LastInitial||Favorite color||Score of ArtClass||Present in ArtClass||Favorite activity||Score of GymClass||Present in GymClass|
Notice how in the second version, I have just one common set of the matching columns -- FirstName and LastInitial -- which is fully populated for all rows. I also have columns to tell me if the key was present in each of the source tables (ArtClass and GymClass). In contrast, in the first version (Students1), the only way you can tell if each key was present in each source table, is by seeing if the segregated key columns ("FirstName of ArtClass"/"LastInitial of ArtClass", "FirstName of GymClass"/"LastInitial of GymClass") are missing or present.
A way that I contrived the second version, Students2, was by doing the following roundabout procedure:
Note that, in this procedure for making Students2, I had to manually segregate the conflicting non-key column, "Score", which the Join function had originally done for me automatically when making Students1 (by changing each "Score" to "Score of..."), which is what I wanted it to do. However, the problem with the Join function is that it does the same thing for the key columns, which I do not want it to do -- I want the key columns to be common for the entire joined table, with separate columns to flag the presence in each source table. My procedure above is not scalable to situations where you might have many more conflicting non-key columns for your data.
The only alternative method I've realized, is to use the Join function as before, but to manually add a formula column for every single match column that would combine the segregated pair of that match column (e.g. "FirstName" would be a formula column on "FirstName of ArtClass" and "FirstName of GymClass"), and also add formula columns for presence in each source table that would have to test if all of the segregated match columns corresponding to each source table are missing or not (for example, "Present in ArtClass" would be a forumula of whether both "FirstName of ArtClass" and "LastInitial of ArtClass" are missing). This is also a very tedious, roundabout process.
I'm wondering if there's a better, efficient solution for what I'm trying to do -- i.e. to generate Students2 in my example? Would it be with a script or add-in of some kind?
In addition, would SAS potentially consider something like this as a future feature extension for JMP?
Appreciate any assistance with this.
There's a "Merge same name columns" option in the Join dialog that will do as its name suggests and get you one set of First Name and Last Initial columns.
In your example, you've got a column named Score in each table so you'll need to rename them to Gym Score and Art Score. After doing that you can complete the Join dialog like this:
And you get a data table like this:
The Match Flag column indicates which tables contributed to each row. 1==Main Table, 2==With Table, 3 == Both.
I think that's what you're looking for, or awfully close anyway.
Ah, excellent! (Thanks for the fast response.) I guess that was right under my nose in a way, and I can't quite remember if I might have tried that checkbox a long time ago and forgotten about what it does. I'd say that solves the majority of this problem, because it's pretty simple to Recode "Match Flag" (e.g. into "ArtClass"/"GymClass"/"Both") or split it up into two separate "presence" columns if desired (like I had in Students2).
I suppose the only caveat here, however, is that there could be a lot of manual column renaming required if there are many analogous non-key columns with common names like "Score" in my example. It looks like if you don't rename the common non-key column(s), then apparently those column(s) in the resulting joined table seem to take the value arbitrarily from one source table or the other. (Perhaps there is some internal method as to which one "wins".)
In this way, the "Join" utility is giving us a choice between either:
(a) disambiguating everything automatically which we want but segregating the key columns which we don't want, when we don't select "Merge same column names", as in generating Students1 in my post, or
(b) commonly combining the match columns and giving a presence-indicating column, which we want, but letting non-key columns with common names conflict with each other, which we don't want -- when we do select "Merge same column names".
Do you know of a good workaround to this caveat?
Otherwise, thanks for pointing this out!
I think you've summarized the behaviors and limitations pretty well.
I'll pass them along to our development staff to see if we can make some improvements in a future release.