cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
HSS
HSS
Level IV

Looking to generate two columns based on some pattern from other columns

Hi All,

I am looking to generate two new columns based on patterns in other two columns ( "Required column1" and "Required Column2"). Please see the attached image -

HSS_1-1639688163428.png

 

Any help/suggestion ?

Thanks. 

14 REPLIES 14
jthi
Super User

Re: Looking to generate two columns based on some pattern from other columns

I gave a go with this. I try to avoid using Lag() whenever possible and replace it with Col functions. The most difficult formula seems to be Required Column 1 and I couldn't quickly figure out good formula without using Lag() (there is most likely a solution with Col Cumulative Sum somehow). One idea would be to fill places between Start and End with "Ongoing" or something similar, then you could maybe use Col Cumulative Sum easier, but then the most difficult formula would most likely just move to Start and End.

View more...
Names Default To Here(1);

dt = Current Data Table();
//Numbering of Occurrence
dt << New Column("NumberOfOcc", Numeric, Nominal, Formula(
	If(:Sensor == "Primary",
		Col Cumulative Sum(:Sensor == "Primary", :ID, :"Sub-set"n, :Occurrence),
		.
	);
));

//Start and End
dt << New column("StartEnd", Character, Nominal, Formula(
	If(:NumberOfOcc == 1, 
		"End"
	, Col Max(:NumberOfOcc, :ID,:"Sub-set"n, :Occurrence) == :NumberOfOcc,
		"Start"
	, //else
		""
	)
));

//Required Column 1, this could most likely be made somehow with Col Cumulative Sum
dt << New column("Req1", Numeric, Ordinal, Formula(
	If(Row() == 1,
		val = .
	);
	If(:StartEnd == "Start",
		val = 1
	, IsMissing(:StartEnd) | :StartEnd == "End",
		val += 1;
	);
	If(:Lag(:StartEnd) == "End" | (Row() == Col Max(Row()) & :StartEnd == "Start") | (:StartEnd == "Start" & Lag(:StartEnd, -1) == "End"), //more missing checks can be added here
		val = .,
		val;
	)
));

//Required Column 2
dt << New column("Req2", Numeric, Nominal, Formula(
	If(!IsMissing(:Req1),
		Col Cumulative Sum(:StartEnd == "Start",
			:ID, :"Sub-set"n
		)
	,
		.
	)
));

If you have time, I would suggest testing both ways of solving this, the one @txnelson has provided with more heavy use of Lag and comparisons, and the one I have with Col formulas. Both definitely have their use cases and understanding them will help you in a long run when writing formulas/calculations in JMP. And in some cases when you understand how to combine both, you can solve quite difficult tasks with just formulas.

 

Also after you get the logic working, I would most likely remove Formulas from the datatable (or replace Formula() with << Set Each value in script), because this will most likely break if order is changed in datatable.

 

-Jarmo
HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Dear @jthi (Jarmo) and Dear @txnelson (Jim),

I am really thankful for your support and kindness.

I will certainly find one working way by either of these two methods and some combination of these two.

Many thanks again to both of you.
Merry Christmas and Happy New Year. Enjoy your time.

Best Regards,
Harry

HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Hi All,

Have another query, actually two -

1. What is the difference, If I add more than one condition in single "If" statement vs. Adding more than one "If" statement with single condition. See the attached image - . First case is not giving correct results while the 2nd one is working fine. Results can be seen in the attached sample data. I am having difficulty to understand the differences here.

To me it seems for each row -  in first case, if condition is true, output is given. This ends the results.
While in 2nd case, it will run all 3 if statements sequentially, irrespective of results of previous one and out put will be based on the last true condition. I am not sure if I am able to explain it correctly. Again, I guess in 2nd case (multiple if statement), there is a sort of overwrite on the previous If statement !

HSS_0-1640604998632.png


2.  There is a condition used in the above formula --  "   Row() == Col Max(Row()) & :StartEnd == "Start"   "   How is it working? It works well, means - it is well written. But I am not able to visualize how the variable/counter "val" stops ?

Any help with it ?

Thanks,

 

jthi
Super User

Re: Looking to generate two columns based on some pattern from other columns

1. Maybe this image gives explanation why the first idea won't work (if I remember correctly the second solution relies on that if you calculate  . + 1 it will be .

jthi_2-1640633086678.png

I did take the different comparisons from the If statement and just create formulas out of them so it is easier to see what returns True and when.

You could most likely get the first solution working by modifying the comparisons a bit ( Is Missing(:StartEnd) & Lag(:StartEnd) != "End" | :StartEnd == "End", val += 1). But with this change you will have to make small change to second comparison also (add Row() != Col Max(Row()) check).

 

2. Idea for Row() == Col Max(Row()) & :StartEnd == "Start" is just to to check if we are on last row of the table and :StartEnd value is "Start" we won't set the value as 1 based on previous if statement. Row() will tell you row number and Col Max(Row()) will return you maximum row number in the datatable. Might be a bit unnecessary check because it is last row of table (like the one after it with Start/End), but as it was in the results column, I did implement them.

 

The general idea with the last if-else was following:

1. Lag(:StartEnd) == "End" -> this is general "stop" after End is countered

2. Row() == Col Max(Row()) & :StartEnd == "Start" -> to prevent returning 1 on last row if it is Start

3. :StartEnd == "Start" & Lag(:StartEnd, -1) == "End" -> to prevent returning 1 and 2 if there are no rows between Start and End

 

You can use just one if with multiple comparisons by moving some of the comparisons around a bit (this seemed to return same result as Req1):

If(
	Row() == 1, val = .,
	Row() != Col Max(Row()) & :StartEnd == "Start" & Lag(:StartEnd, -1) != "End", val = 1,
	Is Missing(:StartEnd) & Lag(:StartEnd) != "End" | :StartEnd == "End", val += 1,
	val =.
)

 

Hopefully the explanation isn't too messy

-Jarmo
HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Thanks @jthi for nice explanation.