Selecting latest Date for same ID

Leyahans · Jun 8, 2023 9:38 AM

Hi all,

I have a problem here trying to figure out. I'm still new in JSL thus I'm stuck.
I have this table that gives me the ID of an item, the date it was processed and so on. For same ID, multiple reprocess could've been done and it causes the timestamp of DateTime to double/triple. I want to clean up my table and only maintain the ones with latest DateTime for same ID.

I know i should use loop for. but I'm stuck at how to make the ID find their partner in the list.

Note: The same ID might be in random line items.

Thank you for the help!

vince_faller · Feb 2, 2023 11:39 PM

Colmax() has a by variable you can give to it. I thik this does what you want. You can even give multiple by columns if you want.

Names default to here(1);
dt = New Table( "Untitled 101",
	Add Rows( 10 ),
	New Column( "ID",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values( [1, 1, 2, 2, 3, 3, 3, 3, 1, 1] ),
		Set Display Width( 51 )
	),
	New Column( "Datetime",
		Numeric,
		"Continuous",
		Format( "m/d/y h:m:s", 23, 0 ),
		Input Format( "m/d/y h:m:s", 0 ),
		Set Selected,
		Set Values(
			[3758218580, 3758222180, 3758225780, 3758229380, 3758232980, 3758236580,
			3758240180, 3758243780, 3758247380, 3758250980]
		)
	),
);
dt << New COlumn("Max by ID", 		
	Format( "m/d/y h:m:s", 23, 0 ),
	Formula(Colmax(:Datetime, :ID))
)

Vince Faller - Predictum

brady_brady · Feb 6, 2023 3:12 PM

Hi, if by "clean up my table and only maintain the ones with latest DateTime for same ID" you mean that you wish to keep only the most recent information, deleting all other rows, take a look at the "select duplicate rows" function. Your process then becomes:

1) sort the table by date--in this case, descending, as you want the most recent data on top

2) use the select duplicate rows function to select "duplicated" rows with respect to grouping columns of interest

3) delete these rows

To try this, run the first group of lines in the script below, then run the last 3 lines one by one.

Cheers,

Brady

Names default to here(1);
dt = astable (J(20,1,randominteger(1, 5)) || J(20,1,randominteger(3e9, 3.5e9) ),
	<< column names ({"group", "date"}));
dt:date <<  Format( "m/d/y h:m", 19 );

///////////////
dt << sort (by (:group, :date), order (ascending, descending), replacetable);
dt << Select Duplicate Rows( Match( :group ) );
dt << delete rows;

Selecting latest Date for same ID

Re: Selecting latest Date for same ID

Re: Selecting latest Date for same ID