cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
joann
Level IV

create new id to deal with multiple to multiple relationship

*Update

Sorry I should be more clear on the question. The data i currently have looks like:

id    email

1     a

1     b

2     a

2     c

3     d

We can see that id 1 2 and email a b c are actually the same person because the info overlaps. 

What I'm trying to do here is to come up with a match table with a new key which i can use to match back to the original table to identify those overlap ones are actually the same person:

newID   id/email

111        1

111        2

111        a

111        b

111        c

222       d

The newID doesn't need to be in a specific form, it can even equal to the original id, as long as we can use it to identify a "unique person".

-------------------------------------

Hi, I have data with 2 columns, one is individual id, and one is email. The problem with this data is that the relationship between individual id and email is multiple to multiple, meaning one individual id could has multiple emails, and one email could have multiple individual ids. Thus i’d need to create a mapping table of a new unique id to all individualID/email, which specifies that those individual ids or emails if any overlap are actually the same person. I couldn’t really think of a way to do it without script. Could anyone help? Thank you!

 

 

22 REPLIES 22
joann
Level IV

Re: create new id to deal with multiple to multiple relationship

Thank you so much Jarmo! This outcome out of your script is exactly what i need. I'm not familiar with script so just one following question on how to apply it on my current table. the two column names are "Individualid" and "Email_HASHED", I pasted the below edited script to my table but it showed error. could you debug for me? Thank you!

New Column("newid", numeric, ordinal);

createIndex = function({tempRows, index},
wait(0.5); //for visualization
idList = Individualid[tempRows,0];
emailList = Email_HASHED[tempRows,0];
tempRowsNew = Get Rows Where(Contains(idList, :Individualid) | Contains(emailList, :Email_HASHED));
Select Where(Contains(idList, :Individualid) | Contains(emailList, :Email_HASHED)); //for visualization
wait(0.5); //for visualization
If(N Items(tempRows) != N Items(tempRowsNew),
ids = Individualid[tempRowsNew,0];
emails = Email_HASHED[tempRowsNew,0];
Recurse(tempRowsNew, index),
If(All(tempRows == tempRowsNew),
newId[tempRowsNew] = index;
newRow = Contains(newId << get as matrix, .);
If(newRow == 0,
Clear Select;
return("done"),
ids = Individualid[newRow,0];
emails = Email_HASHED[newRow,0];
Recurse(matrix(newRow), index + 1);
);

)
);
);

wait(1);
createIndex([1], 1);



jthi
Super User

Re: create new id to deal with multiple to multiple relationship

I modified the script a little (changed recursion to while loop as recursion might be a bit difficult from time to time to understand). Also changed column names.

Names Default To Here(1);
//https://community.jmp.com/t5/Discussions/create-new-id-to-deal-with-multiple-to-multiple-relationship/m-p/365298#M61470
dt = New Table("Untitled",
	Add Rows(20),
	Compress File When Saved(1),
	New Column("Individualid",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values([1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9])
	),
	New Column("Email_HASHED",
		Character,
		"Nominal",
		Set Values(
			{"a", "b", "a", "c", "d", "e", "f", "g", "a", "h", "i", "j", "k", "l",
			"m", "n", "o", "p", "q", "b"}
		)
	)
);

dt << Size Window(500,900);
dt << New Column("newid", numeric, ordinal);

//setup first values before While loop
rowList = [1];
idList = dt:Individualid[rowList,0];
emailList = dt:Email_HASHED[rowList,0];
index = 1;
//looping
While(1,
	oldRowList = rowList;
	rowList = dt << Get Rows Where(Contains(idList, :Individualid) | Contains(emailList, :Email_HASHED));
	dt << Select Where(Contains(idList, :Individualid) | Contains(emailList, :Email_HASHED)); //for visualization
	wait(1); //for visualization
	If(N Items(oldRowList) != N Items(rowList),
		idList = dt:Individualid[rowList,0];
		emailList = dt:Email_HASHED[rowList,0],
		If(All(oldRowList == rowList),
			dt:newid[rowList] = index;
			newRow = Contains(dt:newid << get as matrix, .);
			If(newRow == 0,
				break(),
			);
			index += 1;
			rowList = Matrix(newRow);
			idList = dt:Individualid[rowList,0];
			emailList = dt:Email_HASHED[rowList,0];
		);
	);
);

You seen to have removed the dt references from your modified version, that could cause issues.

-Jarmo
joann
Level IV

Re: create new id to deal with multiple to multiple relationship

Thank you Jarmo! How do I change the dt to my current data table but not creating the new table with the samples you put in?
pauldeen
Level VI

Re: create new id to deal with multiple to multiple relationship

dt = current data table();

or

dt = Data table("Name of table here")
jthi
Super User

Re: create new id to deal with multiple to multiple relationship

And depending how you get the datatable you can reference it directly:

 

dt = Open("$SAMPLE_DATA/Big Class.jmp");
dt1 = dt << Subset(All rows);
dt2 = Query(dt, "SELECT name, age, height FROM 'Big Class' WHERE age > 14; ");

and so on

 

-Jarmo
joann
Level IV

Re: create new id to deal with multiple to multiple relationship

Thank you so much Jarmo! It works!! Really appreciate your help.
joann
Level IV

Re: create new id to deal with multiple to multiple relationship

Hi Jarmo, I forgot to mention that in the email column there are some rows having empty cells, meaning some individualID does not have anything populated under email column. I found that the current script will consider all individual ids having no emails as the same new id. could you help adjust the script to avoid this kind of situation? Thank you!
jthi
Super User

Re: create new id to deal with multiple to multiple relationship

How should they be handled? Most likely it is one or two if statements.

Small example dataset with correct answers could help. 

-Jarmo
joann
Level IV

Re: create new id to deal with multiple to multiple relationship

Hi Jarmo, below is an example. you can see many rows having empty email are all grouped as newid 5. If the individualID is not overlapped with other groups, those without emails should each have their own newid. Thank you!

 

Screen Shot 2021-03-09 at 10.36.39 AM.png

jthi
Super User

Re: create new id to deal with multiple to multiple relationship

Couldn't replicate your results (most likely you have different individual ids on the one starting with 308....)

 

Does the result from this look correct?

Names Default To Here(1);
//https://community.jmp.com/t5/Discussions/create-new-id-to-deal-with-multiple-to-multiple-relationship/m-p/365298#M61470

/*
dt = New Table("Untitled",
	Add Rows(20),
	Compress File When Saved(1),
	New Column("Individualid",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values([1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9])
	),
	New Column("Email_HASHED",
		Character,
		"Nominal",
		Set Values(
			{"a", "b", "a", "c", "d", "e", "f", "g", "a", "h", "i", "j", "k", "l",
			"m", "n", "o", "p", "q", "b"}
		)
	)
);
*/

dt = New Table("Untitled",
	Add Rows(19),
	Compress File When Saved(1),
	New Column("Individualid",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values(
			[139, 100, 229, 301, 308, 284, 682, 136, 586, 256, 122, 711, 629, 116,
			152, 443, 301, 695, 311]
		)
	),
	New Column("Email_HASHED",
		Character,
		"Nominal",
		Set Values(
			{"A648", "CAA", "E7B", "5B2", "", "237A", "", "FE0", "", "7C21", "",
			"920B", "", "D394", "", "2FE1", "", "", ""}
		)
	)
);

dt << Size Window(500,900);
dt << New Column("newid", numeric, ordinal);

//setup first values before While loop
rowList = [1];
idList = dt:Individualid[rowList,0];
emailList = dt:Email_HASHED[rowList,0];
index = 1;
//looping
While(1,
	oldRowList = rowList;
	Remove From(emailList, Contains(emailList, ""));
	rowList = dt << Get Rows Where(Contains(idList, :Individualid) | Contains(emailList, :Email_HASHED));
	//dt << Select Where(Contains(idList, :Individualid) | Contains(emailList, :Email_HASHED)); //for visualization
	//wait(1); //for visualization
	If(N Items(oldRowList) != N Items(rowList),
		idList = dt:Individualid[rowList,0];
		emailList = dt:Email_HASHED[rowList,0],
		If(All(oldRowList == rowList),
			dt:newid[rowList] = index;
			newRow = Contains(dt:newid << get as matrix, .);
			
			newRow = 0;
			For Each Row(dt,
				If(:Email_HASHED != "" & IsMissing(:newId),
					newRow = Row();
					break();
				)
			);
			
			If(newRow == 0,
				break(),
			);
			index += 1;
			rowList = Matrix(newRow);
			idList = dt:Individualid[rowList,0];
			emailList = dt:Email_HASHED[rowList,0];
		);
	);
);
//dt << Clear Select;
//handle missings
index++;
missingRows = dt << Get Rows Where(IsMissing(:newId));
dt:newId[missingRows] = index

 

-Jarmo