This website uses Cookies. Click Accept to agree to our website's cookie use as described in our Privacy Policy. Click Preferences to customize your cookie settings.
Looking to generate two columns based on some pattern from other columns
This widget could not be displayed.
Created:
Dec 16, 2021 03:56 PM
| Last Modified: Jun 9, 2023 11:07 AM(2011 views)
Hi All,
I am looking to generate two new columns based on patterns in other two columns ( "Required column1" and "Required Column2"). Please see the attached image -
Re: Looking to generate two columns based on some pattern from other columns
This widget could not be displayed.
Created:
Dec 23, 2021 05:05 AM
| Last Modified: Dec 23, 2021 2:05 AM(1809 views)
| Posted in reply to message from HSS 12-23-2021
I gave a go with this. I try to avoid using Lag() whenever possible and replace it with Col functions. The most difficult formula seems to be Required Column 1 and I couldn't quickly figure out good formula without using Lag() (there is most likely a solution with Col Cumulative Sum somehow). One idea would be to fill places between Start and End with "Ongoing" or something similar, then you could maybe use Col Cumulative Sum easier, but then the most difficult formula would most likely just move to Start and End.
Names Default To Here(1);
dt = Current Data Table();
//Numbering of Occurrence
dt << New Column("NumberOfOcc", Numeric, Nominal, Formula(
If(:Sensor == "Primary",
Col Cumulative Sum(:Sensor == "Primary", :ID, :"Sub-set"n, :Occurrence),
.
);
));
//Start and End
dt << New column("StartEnd", Character, Nominal, Formula(
If(:NumberOfOcc == 1,
"End"
, Col Max(:NumberOfOcc, :ID,:"Sub-set"n, :Occurrence) == :NumberOfOcc,
"Start"
, //else
""
)
));
//Required Column 1, this could most likely be made somehow with Col Cumulative Sum
dt << New column("Req1", Numeric, Ordinal, Formula(
If(Row() == 1,
val = .
);
If(:StartEnd == "Start",
val = 1
, IsMissing(:StartEnd) | :StartEnd == "End",
val += 1;
);
If(:Lag(:StartEnd) == "End" | (Row() == Col Max(Row()) & :StartEnd == "Start") | (:StartEnd == "Start" & Lag(:StartEnd, -1) == "End"), //more missing checks can be added here
val = .,
val;
)
));
//Required Column 2
dt << New column("Req2", Numeric, Nominal, Formula(
If(!IsMissing(:Req1),
Col Cumulative Sum(:StartEnd == "Start",
:ID, :"Sub-set"n
)
,
.
)
));
If you have time, I would suggest testing both ways of solving this, the one @txnelson has provided with more heavy use of Lag and comparisons, and the one I have with Col formulas. Both definitely have their use cases and understanding them will help you in a long run when writing formulas/calculations in JMP. And in some cases when you understand how to combine both, you can solve quite difficult tasks with just formulas.
Also after you get the logic working, I would most likely remove Formulas from the datatable (or replace Formula() with << Set Each value in script), because this will most likely break if order is changed in datatable.
Re: Looking to generate two columns based on some pattern from other columns
This widget could not be displayed.
Created:
Dec 16, 2021 05:46 PM
| Last Modified: Dec 16, 2021 2:46 PM(1425 views)
| Posted in reply to message from HSS 12-16-2021
Here is a formula for the column you refer to Required Column 2. When you study the code, you will see that all it is doing is applying the logic rules you have stated that need to be checked when determining the Start and End values. It also contains the setting of the value for the column Required Column 1. To run this formula, simply create the Required Column 1 and Required Column 2. Set Required Column 2 to be a character column. Then open the formula for Required Column 2 and copy the formula below into the column.
It almost worked with few false entries. I believe we must add extra condition on :Occurrence column. That is :Occurrence must change to set :Required Column 2 == "Start" (and may be option - it should be same when :Required Column 2 == "End").
Something like - Dif(:Occurrence) !=0 then only :Required Column 2 == "Start"
For my sample data (95 rows) was fine with your formula but my actual data which has more than 300k rows, it is giving many wrong entries like below - (I think it should be fine once additional requirement on :Occurrence will be added.) This is just one off screenshot of the wrong entry.
Hopefully!! I can modify the formula/code and can add requirement of :Occurrence. on top of your formula.
I was able to modify your code based perfectly for my requirement. But on the 2nd stage troubleshooting I need something like this --
Mark :Sensor = "Primary" (1st) around consecutive change in :Occurrence. In the given screenshot -
required values in between --- Last "Primary" in Occurrence = 2 and First "Primary" in Occurrence = 3 [consecutive change in Occurrence]
Finding first Primary is not that difficult, not sure how to find last primary ? Thinking to somehow invert (descending order) the subset data for Occurrence = 2, find the value and then revert it back. Not able to put in a formula!! Any other way to do that ?
Re: Looking to generate two columns based on some pattern from other columns
This widget could not be displayed.
Created:
Dec 21, 2021 05:58 AM
| Last Modified: Dec 21, 2021 2:58 AM(1340 views)
| Posted in reply to message from HSS 12-21-2021
You have modified the code I previously provided to get the results that you provide in your most recent response. Please provide the JSL you are using to get the results you are showing in your sample data table.
I am not able to get the correct (updated requirement) start point. In the Data Table ("Sample Data Updated") and image, I had entered corrected data columns manually.
The formula. I used to get "Required Column 1 Old" and " Required Column 2 Old" is this (It is just like you give me, I added occurrence condition on it) -
Re: Looking to generate two columns based on some pattern from other columns
This widget could not be displayed.
Created:
Dec 21, 2021 12:43 PM
| Last Modified: Dec 21, 2021 9:43 AM(1297 views)
| Posted in reply to message from HSS 12-21-2021
i am confused. From what I can see, the script that I wrote seems to work, except that after line 44 in the data table, you have changed the rules, and now want the End to be on the row prior to the next value of Primary. Which is what you want?
Re: Looking to generate two columns based on some pattern from other columns
This widget could not be displayed.
Created:
Dec 21, 2021 01:54 PM
| Last Modified: Dec 21, 2021 10:54 AM(1289 views)
| Posted in reply to message from txnelson 12-21-2021
Hi Jim,
Sorry for the confusion.
Your original formula (I believe, I have not messed it up) worked for most of the cases but also captured few wrong points and missed few on full set of data. In the attached subset of the data (only for one "ID" and one value of "Sub-set") there is a column based on your formula which has few wrong points.
And this was fixed when I put condition on :Occurrence.
Now I am looking for -
1. Last Primary Sensor in occurrence = n (say n = 2), mark it as "Start"
2. First Primary Sensor in Occurrence = n+1 (= 3), and if there is no primary sensor for n = 3, it should stope/mark the value in Occurrence = n+2 == 4. Mark it as "End"
3. And last fill the numbers between "Start" and "End" in a new column.
Alternate way to think of (may be) -- If we can mark - in any single occurrence n (n = 2) First Primary Sensor as END and in last primary sensor as Start and then fill a values between Start and End in a new column.
Re: Looking to generate two columns based on some pattern from other columns
This widget could not be displayed.
Created:
Dec 22, 2021 08:37 AM
| Last Modified: Dec 22, 2021 5:37 AM(1266 views)
| Posted in reply to message from HSS 12-21-2021
The issue appears to be with the logic being applied to the JSL variable "other". I do not have time to work on the issue today, but tomorrow afternoon, I should be able to do some investigation.
I did some progress - and identify correctly the start and end point of each set. I used a very simple logic, you can see in the formula column.
But now I am struggling to put numbers between each pair of "Start" and "End" point. I getting issue in stopping the counter at "End". My issue might have the same complicacy but at least now I have clearing marking/tag on my start and end point.