cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
doraemengs
Level II

Clarification on Categorizing the Same ID in Rows with Other Columns

 
Hi,

I have a dataset where the same ID appears in multiple rows, each with 1 to 3 categories. The columns range from costA to costZ, and I need to consolidate them into a single "Cost" column. My goal is to group by Category 1 according to specific criteria (1 if Category 1 < 2000, 2 if Category 1 = 2000 - 4000, 3 if Category 1 = 4000 - 6000, 4 if Category 1 = 6000 - 10000, 5 if Category 1 > 10000).

However, I want to exclude Category 1 entries that do not have Category 2 or 3 counterparts. Additionally, the time span for Categories 1 to 3 should not exceed 1 month (I have a date column for reference). It's important to note that one ID can have multiple Category 1 to 3 entries.

I've attempted to stack the data but encountered difficulties in properly grouping Category 1 with the other categories. My objective is to create a graph showing Category (x) versus Cost (y) by group.

Does anyone have any ideas or suggestions? Your assistance would be greatly appreciated. Thank you! (I've also attached the Excel file for reference.) I'm using JMP version 16.

Attached: 1 picture and Excel file

Screenshot 2024-03-26 at 12.27.55 AM.png

2 REPLIES 2
jthi
Super User

Re: Clarification on Categorizing the Same ID in Rows with Other Columns

Few questions:

I want to exclude Category 1 entries that do not have Category 2 or 3 counterparts.

jthi_0-1711467838486.png 

ID=1 should be removed? Or all expect for ID4 as it has all the categories

 

span for Categories 1 to 3 should not exceed 1 month

Is this based on which date found for categories as there can be multiple entries for them? Min, max, mean, all, random? Should row 23 be kept or should both row 23 and 24 be thrown out?

jthi_2-1711467998006.png

 

Could you provide a dataset which contains all possible edge-cases that come to your mind + simple cases so solutions can be tested?

-Jarmo
doraemengs
Level II

Re: Clarification on Categorizing the Same ID in Rows with Other Columns

Hi Jarmo, Thank you for your help. My criteria involve ensuring that each ID is associated with category 1 and then linked with category 2 and/or 3. If there are multiple entries with categories 1-3, I prioritize the most updated entry in each category. Additionally, I need to collapse all cost1 - cost10 columns into one cost column. Furthermore, I aim to categorize category 1 into five groups that are linked to category 2 and 3. Finally, I would like to generate a graph based on these criteria. Here is the new data table on excel that have more information. Thank you