Solved: JMP Data Arrangement

jmpquestions123 · Jun 10, 2023 4:35 PM

Hi all, I have a rather complex question on how to arrange data in jmp. Basically I am looking at patient follow-up and how long it will take them to develop a particular disease. I have 3 columns of data. Column 1 is each patient’s identifiable code. Nearly all patients are listed multiple times in column 1 as they had multiple follow-ups. Column 2 is a binary code on whether or not they have the disease (0=no disease, 1=disease) at that follow-up date. And column 3 is their follow-up date.

What I want to do is isolate when column 2 changes from 0—>1 for the first time for each patient (this can happen multiple times for each patient since they could have been treated and cured, thus resetting the disease back to 0, so I just want the first time 0-->1). I need both values because I am taking the average of their corresponding follow-up dates to approximate the date they were positive for the disease. The rest of the data I do not need.

I hope this makes sense, please let me know if I need to elaborate. Sorry in advance if this is not what JMP is meant to do. Thanks!

jthi · Aug 17, 2021 01:00 AM

Does the patient have only one 0 disease value before the first 1 disease value or can there be multiple 0 values before the first 1?

Edit:

Here are two possible solutions depending on the data:

In the first one sort the data first by patientid and time, then create two formula columns and check should have the information you are interested in. There might be some edge cases this will miss (like the last row).

Names Default To Here(1);
dt = New Table("Untitled",
	Add Rows(14),
	Compress File When Saved(1),
	New Column("PatientID",
		Numeric,
		"Nominal",
		Format("Best", 12),
		Set Values([1, 1, 1, 1, 2, 2, 1, 3, 3, 3, 3, 1, 1, 1]),
		Set Display Width(62)
	),
	New Column("Status",
		Numeric,
		"Nominal",
		Format("Best", 12),
		Set Values([0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0]),
		Set Display Width(50)
	),
	New Column("Time",
		Numeric,
		"Continuous",
		Format("m.d.y h:m", 19),
		Input Format("m.d.y h:m"),
		Set Values(
			[3660718210, 3660718260, 3660718320, 3660718380, 3660718440, 3660718500, 3660718560, 3660718620, 3660718680,
			3660718740, 3660718800, 3660718860, 3660718920, 3660718920]
		)
	)
);


// Sort data table
dt << Sort(By(:PatientID, :Time), Replace Table, Order(Ascending, Ascending));

//add formula columns
dt << New Column("CumSumOfStatus",
	Numeric,
	"Continuous",
	Format("Best", 12),
	Formula(Col Cumulative Sum(:Status, :PatientID))
);

//add formula columns
dt << New Column("Check",
	Numeric,
	"Continuous",
	Format("Best", 12),
	Formula(
		If(Row() == 1, :Status);
		If(Lag(:CumSumOfStatus, -1) == 1 & Lag(:PatientID) == :PatientID,
			1,
			0
		);
	)
);

Second one is expecting that patientid has only one 0 status before first 1 status which allows you to just get minimum times for both statuses using summary:

//If only one 0 before 1
dt2 = New Table("Untitled",
	Add Rows(11),
	Compress File When Saved(1),
	New Column("PatientID",
		Numeric,
		"Nominal",
		Format("Best", 12),
		Set Values([1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3]),
		Set Display Width(62)
	),
	New Column("Status",
		Numeric,
		"Nominal",
		Format("Best", 12),
		Set Values([0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0]),
		Set Display Width(50)
	),
	New Column("Time",
		Numeric,
		"Continuous",
		Format("m.d.y h:m", 19),
		Input Format("m.d.y h:m"),
		Set Values(
			[3660718380, 3660718560, 3660718860, 3660718920, 3660718920, 3660718440, 3660718500, 3660718620, 3660718680,
			3660718740, 3660718800]
		)
	)
);

dt << Summary(Group(:PatientID), Min(:Time), Subgroup(:Status), Freq("None"), Weight("None"));

-Jarmo

View solution in original post

jthi · Aug 17, 2021 01:00 AM