cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Submit your abstract to the call for content for Discovery Summit Americas by April 23. Selected abstracts will be presented at Discovery Summit, Oct. 21- 24.
Discovery is online this week, April 16 and 18. Join us for these exciting interactive sessions.
Choose Language Hide Translation Bar
cjw99
Level I

Label the first event for each patient in a large dataset

I have a medical dataset where each row represents an event for a patient. Some patients have only one event, some have multiple events (which are represented by several rows, each with the same "Tx ID" (see picture below), however, I just want to create a column which labels the first event for each patient.

3 ACCEPTED SOLUTIONS

Accepted Solutions

Re: Label the first event for each patient in a large dataset

Hi @cjw99 -
There are a number of ways to do this. One way that just uses a column formula is to use Col Rank()

scott_allen_0-1707244125626.png

 

This ranks the date (earliest is lowest rank) by the ID and checks if it is 1 ( returning 1 or 0 ). 

In a similar data table, you will get this output:

scott_allen_1-1707244239691.png

 

The nice thing about Col Rank is that you don't have to have your data table sorted, it will work if you jumble the rows.

-Scott

View solution in original post

txnelson
Super User

Re: Label the first event for each patient in a large dataset

You can accomplish this by

  1. Select the TX ID column
  2. Go to pull down menu    Rows=>Row Selection=>Select Duplicate Rows
  3. Go to the RowState column and right click on one of the selected rows and select   "Invert Selection"
  4. In the RowState column, go to one of the rows that was selected after the Inversion, and right click and select   Label/UnLabel.

You will now have selected all first lrows for each different TX ID values.

Jim

View solution in original post

Re: Label the first event for each patient in a large dataset

@cjw99 

Here is one way to create your second table with two column formulas (or one slightly more complicated formula), however, it requires you to sort the data table. If the data table is in the order you display, this will work. If not, first sort the data table by Bx Date - Ascending, then TX ID - Ascending

 

The Number of Rejections column tracks the number of rejections by TX ID

scott_allen_0-1707397855235.png

 

The Final Formula column uses Number of Rejections to piece together an output similar to your target.

scott_allen_1-1707397922432.png

 

You can combine the two columns into one, however, my preference is to keep them separate in case you need to make modifications later. You can always hide the intermediate column if you don't want to see it.

Combined Column

scott_allen_2-1707398128460.png

 

scott_allen_3-1707398285335.png

 

If you want the text to match your TCMR Sequence column exactly, you can use a Value Label in the final column.

 

I'm sure there are other ways to get to your target column, but this was the first that came to mind using only column formulas.

-Scott

View solution in original post

7 REPLIES 7

Re: Label the first event for each patient in a large dataset

Hi @cjw99 -
There are a number of ways to do this. One way that just uses a column formula is to use Col Rank()

scott_allen_0-1707244125626.png

 

This ranks the date (earliest is lowest rank) by the ID and checks if it is 1 ( returning 1 or 0 ). 

In a similar data table, you will get this output:

scott_allen_1-1707244239691.png

 

The nice thing about Col Rank is that you don't have to have your data table sorted, it will work if you jumble the rows.

-Scott
cjw99
Level I

Re: Label the first event for each patient in a large dataset

Thank You! I knew there must be a simple way to do this

cjw99
Level I

Re: Label the first event for each patient in a large dataset

I don't want to abuse this service, but is there a way to use this function AND specify the rows ranked that meet a second condition from another column?

 

For example....in my dataset, the dates are biopsy dates. Some of these dates are associated with an outcome of interest (kidney transplant rejection) while others are not.

 

What I would like to be able to do is rank all the rows in order if they meet the rejection condition while skipping the other rows.

In the end, I would have a column that has labelled the 1st rejection, 2nd rejection, etc.

 

For example...like below. In this data "Bx Date" = biopsy date, and "TCMR Sequence" refers to T cell-mediated rejection.

Here I have already put them all in sequence in the TCMR sequence column but ideally, I would like to generate a formula that does this automatically for all 3,700 rows in the database.

I am familiar with creating simple If/Then formulas but I always struggle once the formula has to evaluate multiple rows for a given patient.

 

cjw99_0-1707250615073.png

 

 

cjw99
Level I

Re: Label the first event for each patient in a large dataset

In addition to what I wrote a moment ago....perhaps this data table makes more sense where a rejection yes/no column is present

 

cjw99_1-1707251240046.png

 

Re: Label the first event for each patient in a large dataset

@cjw99 

Here is one way to create your second table with two column formulas (or one slightly more complicated formula), however, it requires you to sort the data table. If the data table is in the order you display, this will work. If not, first sort the data table by Bx Date - Ascending, then TX ID - Ascending

 

The Number of Rejections column tracks the number of rejections by TX ID

scott_allen_0-1707397855235.png

 

The Final Formula column uses Number of Rejections to piece together an output similar to your target.

scott_allen_1-1707397922432.png

 

You can combine the two columns into one, however, my preference is to keep them separate in case you need to make modifications later. You can always hide the intermediate column if you don't want to see it.

Combined Column

scott_allen_2-1707398128460.png

 

scott_allen_3-1707398285335.png

 

If you want the text to match your TCMR Sequence column exactly, you can use a Value Label in the final column.

 

I'm sure there are other ways to get to your target column, but this was the first that came to mind using only column formulas.

-Scott
txnelson
Super User

Re: Label the first event for each patient in a large dataset

You can accomplish this by

  1. Select the TX ID column
  2. Go to pull down menu    Rows=>Row Selection=>Select Duplicate Rows
  3. Go to the RowState column and right click on one of the selected rows and select   "Invert Selection"
  4. In the RowState column, go to one of the rows that was selected after the Inversion, and right click and select   Label/UnLabel.

You will now have selected all first lrows for each different TX ID values.

Jim
cjw99
Level I

Re: Label the first event for each patient in a large dataset

Thank you.... I never noticed the select duplicate rows option before!