I'm a new JMP user.
My data is currently in long format. In other words, subjects have multiple lines of data each. However, I need the data to be in wide format for survival analysis - 1 line of data per subject. When I try to use Transpose, JMP tells me "Cannot transpose. Data columns have different data types". The variables are a mix of continiuous and nominal. So, how am I supposed to do this? I was able to convert a different data set from wide to long without difficulty, but can't seem to do the reverse.
Current data is set up like this (fake data):
ID Var1 Var2 Outcome
001 1 1 0
001 1 1 0
001 1 0 1
002 0 1 0
002 0 1 1
003 1 1 0
I want it like this:
ID Var1_time1 Var2_time1 Var1_time2 Var2_time2.... Outcome_timex
001 1 1 1 1 1
002 0 1 0 1 1
003 1 1 . . 0
@ron_horne has the right solution but perhaps some pictures will help.
Tables -> Split will rearrange the data the way you want but it requires a column that identifies the new columns. In your case you want to create columns for each row within ID so you need a column which counts rows within ID. @ron_horne called this column Index.
You'll also need to make sure that your data is sorted by ID for this to work.
Here's the data that you presented in your first post in the form of a JMP data table.
To this data we need add the Index column. @ron_horne used a different formula for his Index column but I like something a little more straightforward.
The formula above is:
If( :ID != Lag( :ID, 1 ), 1, Lag( :Index, 1 ) + 1)
In prose this says that if the ID value in the current row isn't the same as the ID in the previous row (i.e. Lag ( :ID, 1)) then put a 1 in the Index column, otherwise take the value of the Index column from the previous row (i.e. Lag (:Index, 1))
As you can see, the Index column in the data table now counts the rows within ID.
Now we can use Tables->Split to split your Var1, Var2 and Outcome columns by Index and group the rows by ID.
That will give you this data table, which I think is what you want. It has new columns for each row within ID and one row per ID.
It might be easier to see a simpler example where we don't have the create the Split By column. Here's a video that shows that.
Let me know if I've missed the mark or you're still confused.
Thanks, Jim. That partly worked. However, instead of getting one row for each sbject ID (n=15,000), I ended up with 14 rows of data and each variable name has the SubjectID appended to it.
My data are organized SubjectID. Within that, subjects can have multiple lines of data (treatment codes) for each visit. In long format, there are 1-13 lines of data per subject.
Should multiple "split by" variables be used? Should "group by" be used?
Oops.....I think you can solve the issue if you use the Group and place your ID column in there too
If your data are sorted by ID, (all of the data for a given ID are together). If not sorted, then go to Tables==>Sort, and order the data by ID. Then selecting it as your "Split By" column, and your Var1, Var2, Outcome as your "Split Columns" you will get a data table with one row per value of Split By column, and then you will get Var1 001, Var1 002......Var1 013, Var2 001, Var2 002.......Var2 013, Outcome 001, Outcome 002......Outcome 013.
Given the fact that you can have up to 13 levels.
I'm not sure what's happening then. The data are sorted by ID. When I split by ID and use my vars as "split columns", I end up with 14 lines of data and a column for each id/variable combination.
I'm trying to get:
ID visit1 visit 2 .... visit13 outcome1 outcome2 ... outcome 13
001 9/13 10/20 ... 3/2 0 0 ... 1
002 7/04 8/12 ... 12/5 0 1 ... 1
What I'm getting is
Line1 9/13 7/04
Line2 10/20 8/12
Jim is right. Yet, if you are doing this just to produce summaries of the data per ID you can save this step by using Tables >>> summary.
Sorry for not double checking first.
Try the following
names default to here(1);
// source data table
dt = New Table( "source",
Add Rows( 10 ),
New Column( "ID ",
Format( "Best", 5 ),
Set Values( [1, 1, 1, 2, 2, 3, 3, 4, 4, 4] )
New Column( "Var1 ",
Format( "Best", 12 ),
Set Values( [1, 1, 1, 0, 0, 1, 1, 0, 1, 0] )
New Column( "Var2 ",
Format( "Best", 12 ),
Set Values( [1, 1, 0, 1, 1, 1, 0, 1, 1, 0] )
New Column( "Outcome",
Format( "Best", 12 ),
Set Values( [0, 0, 1, 1, 1, 0, 0, 1, 1, 1] )
// add index column for splitting - no need to sort the table first
dt << new column ("index", formula(Sum( :ID[Index( 1, Row() )] == :ID ) ) );
// now split by index and group by ID
dt << Split(
Split By( :index ),
Split( :Var1, :Var2, :Outcome ),
Group( :ID )
OK, I'm new to JMP and have never used JSL, although I've done plenty of programming in SPSS, SAS and a few others. I'm not following the scripting at all in this example, especially the "set values". I think the next block of commands is trying to create new index variables for grouping prior to splitting.
I created a few variables when going from wide to long in another dataset, but in that case, every subject had the same number of observations.
if you ran the whole script as is and the result is what you are looking for then you can reproduce it on your data set with just a few clicks.
1) add a new column and call it "index".
2) set its' formula to the following: Sum( :ID[Index( 1, Row() )] == :ID )
3) Tabls >> Split and fill it as so.
click OK and you should be done.