Few questions:
I want to exclude Category 1 entries that do not have Category 2 or 3 counterparts.
ID=1 should be removed? Or all expect for ID4 as it has all the categories
span for Categories 1 to 3 should not exceed 1 month
Is this based on which date found for categories as there can be multiple entries for them? Min, max, mean, all, random? Should row 23 be kept or should both row 23 and 24 be thrown out?
Could you provide a dataset which contains all possible edge-cases that come to your mind + simple cases so solutions can be tested?
-Jarmo