cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
HSS
HSS
Level IV

Looking to generate two columns based on some pattern from other columns

Hi All,

I am looking to generate two new columns based on patterns in other two columns ( "Required column1" and "Required Column2"). Please see the attached image -

HSS_1-1639688163428.png

 

Any help/suggestion ?

Thanks. 

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Looking to generate two columns based on some pattern from other columns

I gave a go with this. I try to avoid using Lag() whenever possible and replace it with Col functions. The most difficult formula seems to be Required Column 1 and I couldn't quickly figure out good formula without using Lag() (there is most likely a solution with Col Cumulative Sum somehow). One idea would be to fill places between Start and End with "Ongoing" or something similar, then you could maybe use Col Cumulative Sum easier, but then the most difficult formula would most likely just move to Start and End.

View more...
Names Default To Here(1);

dt = Current Data Table();
//Numbering of Occurrence
dt << New Column("NumberOfOcc", Numeric, Nominal, Formula(
	If(:Sensor == "Primary",
		Col Cumulative Sum(:Sensor == "Primary", :ID, :"Sub-set"n, :Occurrence),
		.
	);
));

//Start and End
dt << New column("StartEnd", Character, Nominal, Formula(
	If(:NumberOfOcc == 1, 
		"End"
	, Col Max(:NumberOfOcc, :ID,:"Sub-set"n, :Occurrence) == :NumberOfOcc,
		"Start"
	, //else
		""
	)
));

//Required Column 1, this could most likely be made somehow with Col Cumulative Sum
dt << New column("Req1", Numeric, Ordinal, Formula(
	If(Row() == 1,
		val = .
	);
	If(:StartEnd == "Start",
		val = 1
	, IsMissing(:StartEnd) | :StartEnd == "End",
		val += 1;
	);
	If(:Lag(:StartEnd) == "End" | (Row() == Col Max(Row()) & :StartEnd == "Start") | (:StartEnd == "Start" & Lag(:StartEnd, -1) == "End"), //more missing checks can be added here
		val = .,
		val;
	)
));

//Required Column 2
dt << New column("Req2", Numeric, Nominal, Formula(
	If(!IsMissing(:Req1),
		Col Cumulative Sum(:StartEnd == "Start",
			:ID, :"Sub-set"n
		)
	,
		.
	)
));

If you have time, I would suggest testing both ways of solving this, the one @txnelson has provided with more heavy use of Lag and comparisons, and the one I have with Col formulas. Both definitely have their use cases and understanding them will help you in a long run when writing formulas/calculations in JMP. And in some cases when you understand how to combine both, you can solve quite difficult tasks with just formulas.

 

Also after you get the logic working, I would most likely remove Formulas from the datatable (or replace Formula() with << Set Each value in script), because this will most likely break if order is changed in datatable.

 

-Jarmo

View solution in original post

14 REPLIES 14
txnelson
Super User

Re: Looking to generate two columns based on some pattern from other columns

Here is a formula for the column you refer to Required Column 2.  When you study the code, you will see that all it is doing is applying the logic rules you have stated that need to be checked when determining the Start and End values.  It also contains the setting of the value for the column Required Column 1.  To run this formula, simply create the Required Column 1 and Required Column 2.  Set Required Column 2 to be a character column.  Then open the formula for Required Column 2 and copy the formula below into the column.

txnelson_0-1639694637149.png

RCValue = "";
:Required Column 1[Row()] = .;
If( :Sensor != "Primary",
	other = "Yes"
);
If( Row() == 1 | :ID != Lag( :ID ) | :"Sub-set"n != Lag( :"Sub-set"n ),
	counter = 0;
	withinStart = "No";
	other = "Yes";
);
If(
	:Sensor == "Primary" & withinStart == "Yes",
		RCValue = "End";
		withinStart = "No";
		other = "No";,
	:Sensor == "Primary" & Lag( :Sensor, -1 ) != "Primary" & withinStart == "No" & other == "Yes" & Row() !=
	N Rows( Current Data Table() ),
		withinStart = "Yes";
		counter = counter + 1;
		RCValue = "Start";
);
If( withinStart == "Yes" | RCValue == "End",
	:Required Column 1[Row()] = counter
);
RCValue;

 

Jim
HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Hi @txnelson

It almost worked with few false entries. I believe we must add extra condition on :Occurrence column. That is :Occurrence must change to set :Required Column 2 == "Start" (and may be option - it should be same when :Required Column 2 == "End").

 

Something like - Dif(:Occurrence) !=0 then only  :Required Column 2 == "Start"

For my sample data (95 rows) was fine with your formula but my actual data which has more than 300k rows, it is giving many wrong entries like below - (I think it should be fine once additional requirement on :Occurrence will be added.) This is just one off screenshot of the wrong entry.

HSS_0-1639707466272.png


Hopefully!!  I can modify the formula/code and can add requirement of :Occurrence. on top of your formula.

Thanks again for your support.



HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Hi @txnelson and All,

I was able to modify your code based perfectly for my requirement. But on the 2nd stage troubleshooting I need something like this --

Mark :Sensor = "Primary" (1st) around consecutive change in :Occurrence.    In the given screenshot -

 

required values in between ---  Last "Primary" in Occurrence = 2  and  First "Primary" in Occurrence = 3  [consecutive change in Occurrence]

Finding first Primary is not that difficult, not sure how to find last primary ? Thinking to somehow invert (descending order) the subset data for Occurrence = 2, find the value and then revert it back. Not able to put in a formula!! Any other way to do that ?

Any help?
Thanks.

HSS_0-1640075781746.png

 



txnelson
Super User

Re: Looking to generate two columns based on some pattern from other columns

You have modified the code I previously provided to get the results that you provide in your most recent response. Please provide the JSL you are using to get the results you are showing in your sample data table.

Jim
HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Hi @txnelson , thanks for your response.

I am not able to get the correct (updated requirement) start point. In the Data Table ("Sample Data Updated") and image, I had entered corrected data columns  manually. 

The formula. I used to get "Required Column 1 Old" and " Required Column 2 Old" is this (It is just like you give me, I added occurrence condition on it) -

RCValue = "";
:Required Column 1[Row()] = .;
If( :Occurrence != Lag( Occurrence, -1 ),
	other = "Yes"
);
If( Row() == 1 | :ID != Lag( :ID ) | :"Sub-set"n != Lag( :"Sub-set"n ),
	counter = 0;
	withinStart = "No";
	other = "Yes";
);
If(
	:Sensor == "Primary" & :Occurrence == Lag( :Occurrence, -1 ) & Dif( :Occurrence ) == 0 &
	withinStart == "Yes",
		RCValue = "End";
		withinStart = "No";
		other = "No";,
	:Occurrence != Lag( Occurrence, -1 ) & Lag( :Sensor, -1 ) != "Primary" & withinStart == "No"
	 & other == "Yes" & Row() != N Rows( Current Data Table() ),
		withinStart = "Yes";
		counter = counter + 1;
		RCValue = "Start";
);
If( withinStart == "Yes" | RCValue == "End",
	:Required Column 1[Row()] = counter
);
RCValue;

HSS_0-1640088826157.png

Many thanks.

txnelson
Super User

Re: Looking to generate two columns based on some pattern from other columns

i am confused.  From what I can see, the script that I wrote seems to work, except that after line 44 in the data table, you have changed the rules, and now want the End to be on the row prior to the next value of Primary.  Which is what you want?

txnelson_0-1640108529298.png

 

Jim
HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Hi Jim,

Sorry for the confusion.

Your original formula (I believe, I have not messed it up) worked for most of the cases but also captured few wrong points and missed few on full set of data. In the attached subset of the data (only for one "ID" and one  value of "Sub-set") there is a column based on your formula which has few wrong points.

Rows:  647-649 wrong entry (Red Colour)
Rows: 868-937 not captured (Red Colour)
Rows: 1307-1311 wrong entry (Red Colour)

And this was fixed when I put condition on :Occurrence. 

Now I am looking for -

1. Last Primary Sensor in occurrence = n (say n = 2), mark it as "Start"

2. First Primary Sensor in Occurrence = n+1 (= 3), and if there is no primary sensor for n = 3, it should stope/mark the value in Occurrence = n+2 == 4. Mark it as "End"

3. And last fill the numbers between "Start" and "End" in a new column.

Alternate way to think of (may be) --
If we can mark - in any single occurrence n (n = 2) First Primary Sensor as END and in last primary sensor as Start and then fill a values between Start and End in a new column.

I hope this will help and make things clear.

Thanks,
I think this can be, if I can find a 








txnelson
Super User

Re: Looking to generate two columns based on some pattern from other columns

The issue appears to be with the logic being applied to the JSL variable "other".  I do not have time to work on the issue today, but tomorrow afternoon, I should be able to do some investigation.

Jim
HSS
HSS
Level IV

Re: Looking to generate two columns based on some pattern from other columns

Thank you @txnelson ,

I did some progress - and identify correctly the start and end point of each set. I used a very simple logic, you can see in the formula column.

But now I am struggling to put numbers between each pair of "Start" and "End" point. I getting issue in stopping the counter at "End". My issue might have the same complicacy but at least now I have clearing marking/tag on my start and end point.

Any help with this ?
Thanks.

HSS_0-1640247234316.png

Thanks.