I have a large databse of patients hospitalized for a certain condition. I have been trying to make a new column to indicate the "years prior to death". See below:
This is what my database looks like:
YEAR | patient | Mortality_in365days |
---|---|---|
1990 | 221 | |
1991 | 221 | |
1991 | 221 | |
1993 | 221 | |
1995 | 221 | * |
2000 | 789 | |
2001 | 789 | |
2002 | 789 | * |
2001 | 965 | |
2005 | 965 | * |
This is what I am trying to create:
YEAR | patient | Mortality_in365days | Death year | Years before death |
---|---|---|---|---|
1990 | 221 | 1995 | 5 | |
1991 | 221 | 1995 | 4 | |
1991 | 221 | 1995 | 4 | |
1993 | 221 | 1995 | 2 | |
1995 | 221 | * | 1995 | 0 |
2000 | 789 | 2002 | 2 | |
2001 | 789 | 2002 | 1 | |
2002 | 789 | * | 2002 | 0 |
2001 | 965 | 2005 | 4 | |
2005 | 965 | * | 2005 | 0 |
A star in mortality_in365days column means they died within the last 365 days. I know the years before death will not be an exact calculation since depending on how long ago they died (30 days versus 365 days) it may mean they died the year prior. I had set up JMP to make the death year column, and then make a simple subtraction formula to calculate years before death, but then I run into problems. I had done it manually using the update feature since I am not very good at scripting yet. If you can help me either with manual instructions or scripting please let me know.
Thanks.
Know little about JMP, however, it seems to me a 2X DUL will do the job as well:
data have;
infile cards missover;
input YEAR patient $ Mortality_in365days $;
cards;
1990 221
1991 221
1991 221
1993 221
1995 221 *
2000 789
2001 789
2002 789 *
2001 965
2005 965 *
;
data want;
do until (last.patient);
set have;
by patient year;
Death_year=ifn(last.patient,year,death_year);
retain death_year;
end;
do until (last.patient);
set have;
by patient year;
Years_Before_death=death_year-year;
output;
end;
run;
proc print;run;
Regards,
Haikuo
The maxiumum year for each patient obviously correspond to year of death. Manually you can then do a Summary with max year grouped by Patient. Then Update the original table with the summary using Patient as matching columns in both tables. Finally a simple subtraction (Death year - Year) would give the the last column.
Here is one slightly different way way to do it by script
dt = Data Table( "database" );
Summarize( p = by( :Patient ), maxyear = Max( :Year ) );
aa = Associative Array( Eval( p ), As List( maxyear ) );
col1 = dt << New Column( "Death year", numeric );
col2 = dt << New Column( "Years before death", numeric );
For Each Row(
col1[] = aa[Char( :Patient[] )];
col2[] = col1[] - :Year[];
);