cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
ubpk
Level II

Overlap / Count of periods at the same point of time

Hi all,

 

I currently struggle with some data visualisation. I have a huge table containing thousands of events which are captured incl. start and end time of different events on different locations (these are displayed as text) in the following format which I successfully converted to be recognised as time:

Start: 2022-01-28T22:26:13

End: 2022-01-28T23:10:56

 

I would now like to visualise / count the number of parallel events, i.e. overlay all resulting periods and see the max number of parallel events over the different locations. I also have the duration available as the difference between start and end time, so in case this is needed....

 

Maybe one can help me?

 

Cheers!

 

JMP16

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Overlap / Count of periods at the same point of time

If the idea is to only get maximum amount of overlaps for location, you could first calculate all overlaps by adding start and end times to a matrix and then sorting it while keeping track what is end and what is start. Then you can calculate how many values fit between each start-end pair.

 

Maybe something like this will work (might have some problems with duplicated values):

Names Default To Here(1);
dt = Current Data Table();
dt << New Column("Overlap", Numeric, Continuous);
dt << Sort(By(:"Start time (UTC)"n), Replace Table, Order(Ascending));

starts = dt[0, 1];
ends = dt[0, 2];

m = Sort Ascending(starts |/ ends);
For Each Row(dt,
	start = :"Start time (UTC)"n;
	end = :"End time (UTC)"n;

	:Overlap = Loc(m, end)[1] - Loc(m, start)[1] - 1;
);

/*
dt << New Column("MaxOverlap", Numeric, Continuous, << Set Each Value(
	Col Max(:Overlap, :Location)
));
*/

tab = dt << Tabulate(
	Show Control Panel(0),
	Add Table(
		Column Table(Statistics(Min, Mean, Max), Analysis Columns(:Overlap)),
		Row Table(Grouping Columns(:Location))
	)
);

jthi_4-1651248060337.png

 

You should be able to do this also almost interactively in JMP by creating new datatable with three columns: Time, TypeOfTime(start,end) and original row number. Then sort it by ascending and write a formula to check for overlaps (Could maybe use Lag(), I used Col Min(Row(), :Row) and Col Max(Row(), :Row()) and their difference). Finally create summary table to get max value / join back to original table based on row numbers.

jthi_0-1651247745080.png

Summary (didn't add max here):

jthi_1-1651247761935.png

There seems to be slight differences between the two methods, so maybe one of them is wrong. Maybe caused by duplicated values (I did also delete missing values from interactive method)

jthi_5-1651248189261.png

Much easier to look for possible issues when you have data table for it:

jthi_3-1651248014962.png

 

Also the interactive method can be scripted very easily with JMP16's enhanced log.

 

Edit: This most likely has following issues: 1. if one time starts before checked start time and ends after checked end time, it won't be calculated and 2. if the time starts and ends between checked times, it will be calculated twice. Second one should be fairly easy to fix but the first one will require a bit more.

Example of 2. OriginalRowOrder starts and ends inside OriginalRowOrder 10 and gets calculated twice:

jthi_2-1651316395258.png

 

Example of 1. OriginalRowOrder 13 starts and ends outside 14:

jthi_1-1651316349066.png

 

Edit2: Didn't bother figuring out how those could be taken care of interactively so I fairly quickly wrote two formulas (hopefully they are correct...). Missing values must be removed / recoded before using these. OverlapsOutside might get a bit slow if done with formula without any additional optimizations.

OverlapsInside (should solve problem2):

N Items(Associative Array(:OriginalRowOrder[Index(:MinRowForRow, :MaxRowForRow)])) - 1

OverlapsOutside (should solve problem1):

If(Row() == 1,
	m = :OriginalRowOrder << Get As Matrix
);
Try(
	starts = Associative Array(m[Index(1, :MinRowForRow - 1)]);
	ends = Associative Array(m[Index(:MaxRowForRow + 1, N Rows())]);
	starts << Intersect(ends);
	N Items(starts);
,
	.
);

OverlapsInside might be enough, depending on how you want to calculate overlaps. If you need both, then just get sum of those values.

jthi_0-1651317678778.png

jthi_1-1651317692377.png

 

 

-Jarmo

View solution in original post

9 REPLIES 9
ron_horne
Super User (Alumni)

Re: Overlap / Count of periods at the same point of time

Hi @ubpk ,

perhaps if you give us a small sample in a table i can give it a try.

otherwise, have a look at the attached table and image and let us know if this is in the right direction to start. if you run the script in the data table you get the graph.

 

All the best,

Ron

Untitled 3 - Graph Builder.png

ubpk
Level II

Re: Overlap / Count of periods at the same point of time

here's a reduced dataset with only two days, but it should do the job. I do not necessarily need a graphical view, but for debugging it would be great to have some kind of control on whether the number of parallel events is reasonable.

Thanks in advance!!

 

Ulf

jthi
Super User

Re: Overlap / Count of periods at the same point of time

If the idea is to only get maximum amount of overlaps for location, you could first calculate all overlaps by adding start and end times to a matrix and then sorting it while keeping track what is end and what is start. Then you can calculate how many values fit between each start-end pair.

 

Maybe something like this will work (might have some problems with duplicated values):

Names Default To Here(1);
dt = Current Data Table();
dt << New Column("Overlap", Numeric, Continuous);
dt << Sort(By(:"Start time (UTC)"n), Replace Table, Order(Ascending));

starts = dt[0, 1];
ends = dt[0, 2];

m = Sort Ascending(starts |/ ends);
For Each Row(dt,
	start = :"Start time (UTC)"n;
	end = :"End time (UTC)"n;

	:Overlap = Loc(m, end)[1] - Loc(m, start)[1] - 1;
);

/*
dt << New Column("MaxOverlap", Numeric, Continuous, << Set Each Value(
	Col Max(:Overlap, :Location)
));
*/

tab = dt << Tabulate(
	Show Control Panel(0),
	Add Table(
		Column Table(Statistics(Min, Mean, Max), Analysis Columns(:Overlap)),
		Row Table(Grouping Columns(:Location))
	)
);

jthi_4-1651248060337.png

 

You should be able to do this also almost interactively in JMP by creating new datatable with three columns: Time, TypeOfTime(start,end) and original row number. Then sort it by ascending and write a formula to check for overlaps (Could maybe use Lag(), I used Col Min(Row(), :Row) and Col Max(Row(), :Row()) and their difference). Finally create summary table to get max value / join back to original table based on row numbers.

jthi_0-1651247745080.png

Summary (didn't add max here):

jthi_1-1651247761935.png

There seems to be slight differences between the two methods, so maybe one of them is wrong. Maybe caused by duplicated values (I did also delete missing values from interactive method)

jthi_5-1651248189261.png

Much easier to look for possible issues when you have data table for it:

jthi_3-1651248014962.png

 

Also the interactive method can be scripted very easily with JMP16's enhanced log.

 

Edit: This most likely has following issues: 1. if one time starts before checked start time and ends after checked end time, it won't be calculated and 2. if the time starts and ends between checked times, it will be calculated twice. Second one should be fairly easy to fix but the first one will require a bit more.

Example of 2. OriginalRowOrder starts and ends inside OriginalRowOrder 10 and gets calculated twice:

jthi_2-1651316395258.png

 

Example of 1. OriginalRowOrder 13 starts and ends outside 14:

jthi_1-1651316349066.png

 

Edit2: Didn't bother figuring out how those could be taken care of interactively so I fairly quickly wrote two formulas (hopefully they are correct...). Missing values must be removed / recoded before using these. OverlapsOutside might get a bit slow if done with formula without any additional optimizations.

OverlapsInside (should solve problem2):

N Items(Associative Array(:OriginalRowOrder[Index(:MinRowForRow, :MaxRowForRow)])) - 1

OverlapsOutside (should solve problem1):

If(Row() == 1,
	m = :OriginalRowOrder << Get As Matrix
);
Try(
	starts = Associative Array(m[Index(1, :MinRowForRow - 1)]);
	ends = Associative Array(m[Index(:MaxRowForRow + 1, N Rows())]);
	starts << Intersect(ends);
	N Items(starts);
,
	.
);

OverlapsInside might be enough, depending on how you want to calculate overlaps. If you need both, then just get sum of those values.

jthi_0-1651317678778.png

jthi_1-1651317692377.png

 

 

-Jarmo
jthi
Super User

Re: Overlap / Count of periods at the same point of time

I noticed some issues with my initial approach, so I did some rethinking and wrote new script, hoping that it is now correct. This could be most likely made faster by using only matrix calculations, but I didn't bother doing it now. It does leave quite a few data tables open on purpose, so it is easier to check if the values are correct

Names Default To Here(1);
dt = Current Data Table();
dt << Clear Select;
dt << Sort(By(:"Start time (UTC)"n), Replace Table, Order(Ascending));
dt << New Column("Overlap", Numeric, Continuous);
dt << New Column("OriginalRowOrder", Numeric, Continuous, <<Set Each Value(Row()));

dt_stack = dt << Stack(
	columns(:"Start time (UTC)"n, :"End time (UTC)"n),
	Source Label Column("Type"),
	Stacked Data Column("Time"),
	"Non-stacked columns"n(Keep(:OriginalRowOrder)),
	Output Table("Overlap_support_table")
);
dt_stack << Sort(By(:Time), Replace Table, Order(Ascending));
dt_stack << Select Where(IsMissing(:Time)) << Delete Rows << Clear Select;
dt_stack << New Column("MinRowForRow", Numeric, "Continuous", Format("Best", 12), Formula(Col Min(Row(), :OriginalRowOrder)));
dt_stack << New Column("MaxRowForRow", Numeric, "Continuous", Format("Best", 12), Formula(Col Max(Row(), :OriginalRowOrder)));

dt_overlaps = dt_stack << Summary(
	Group(:OriginalRowOrder, :MinRowForRow, :MaxRowForRow),
	Freq("None"),
	Weight("None"),
	Link to original data table(0),
	Output table("Overlaps")
);
// Close(dt_stack, no save);
Try(dt_overlaps << Delete Columns("N Rows"));

// these are calculated in one go, because this might be a bit faster
// you will lose the knowledge of type and count of overlaps though
// so splitting these into two different columns (three with sum) might be a good idea
dt_overlaps << New Column("Overlaps",
	Numeric,
	"Continuous",
	Format("Best", 12),
	Formula(
		If(Row() == 1,
			starts = :MinRowForRow << Get As Matrix;
			ends = :MaxRowForRow << Get as Matrix;
		);
		inside = N Rows(Loc(:MinRowForRow < starts < :MaxRowForRow | :MinRowForRow < ends < :MaxRowForRow));
		outside = Sum(starts < :MinRowForRow & ends > :MaxRowForRow);
		inside + outside; // if you need only inside, then don't calculate outside overlaps and just return inside
	)
);

dt_overlaps << Update(
	With(dt),
	Match Columns(:OriginalRowOrder = :OriginalRowOrder),
	Add Columns from Update Table(
		:"Start time (UTC)"n, :"End time (UTC)"n, :Location
	)
);

dt_statistics = dt_overlaps << Summary(
	Group(:Location),
	Min(:Overlaps),
	Max(:Overlaps),
	Range(:Overlaps),
	Mean(:Overlaps),
	Median(:Overlaps),
	Freq("None"),
	Weight("None"),
	statistics column name format("stat"),
	Link to original data table(0),
	output table name("Overlap statistics by location")
);

f = Function({a},
	selected_rows = dt_overlaps << Get Selected Rows;
	dt_stack << Select Where(Contains(dt_overlaps[selected_rows, "OriginalRowOrder"], :OriginalRowOrder))
);
rs = dt_overlaps << make row state handler(f);
-Jarmo
ubpk
Level II

Re: Overlap / Count of periods at the same point of time

Hi jthi,

 

first of all many thanks for your effort! I will check the solution as soon as possible and will give you feedback!!

 

Cheers

Ulf

ubpk
Level II

Re: Overlap / Count of periods at the same point of time

Hi,

 

many thanks for your code. I checked it and it might be helpful for other purposes. BUT: Your idea

 

"You should be able to do this also almost interactively in JMP by creating new datatable with three columns: Time, TypeOfTime(start,end) and original row number. Then sort it by ascending and write a formula to check for overlaps (Could maybe use Lag(), I used Col Min(Row(), :Row) and Col Max(Row(), :Row()) and their difference). Finally create summary table to get max value / join back to original table based on row numbers."

 

was simple and great! I just stacked, ordered by time stamps and implemented a formula channel simply returning +1/-1 based on whether it is start or end and then sum up these in another channel. Works like a charm!

jthi
Super User

Re: Overlap / Count of periods at the same point of time

There are just few problems with that method (that I did notice later), but luckily as it is interactive method, you can fairly easily see them from data (I added some example images). These also might not be an issue, depending on what is being calculated.

 

1. if one time starts before checked start time and ends after checked end time, it won't be calculated

2. if the time starts and ends between checked times, it will be calculated twice

-Jarmo
txnelson
Super User

Re: Overlap / Count of periods at the same point of time

I suggest that you download the Gantt Chart Addin.  It will provide you with a great way to display your events.

txnelson_0-1651179692492.png

 

Jim
ubpk
Level II

Re: Overlap / Count of periods at the same point of time

Hi @txnelson,

 

Thanks for the hint, will download the add in since it looks quite usable. But in my case, the y-axis would be too overloaded due to the amount of locations I have.