Discussions

jswislar · Nov 17, 2016 10:07 AM

I'm a new JMP user.

My data is currently in long format. In other words, subjects have multiple lines of data each. However, I need the data to be in wide format for survival analysis - 1 line of data per subject. When I try to use Transpose, JMP tells me "Cannot transpose. Data columns have different data types". The variables are a mix of continiuous and nominal. So, how am I supposed to do this? I was able to convert a different data set from wide to long without difficulty, but can't seem to do the reverse.

Current data is set up like this (fake data):

ID Var1 Var2 Outcome

001 1 1 0

001 1 0 1

002 0 1 0

002 0 1 1

003 1 1 0

I want it like this:

ID Var1_time1 Var2_time1 Var1_time2 Var2_time2.... Outcome_timex

001 1 1 1 1 1

002 0 1 0 1 1

003 1 1 . . 0

Thanks!

Jeff_Perkinson · Nov 21, 2016 04:47 PM

@ron_horne has the right solution but perhaps some pictures will help.

Tables -> Split will rearrange the data the way you want but it requires a column that identifies the new columns. In your case you want to create columns for each row within ID so you need a column which counts rows within ID. @ron_horne called this column Index.

You'll also need to make sure that your data is sorted by ID for this to work.

Here's the data that you presented in your first post in the form of a JMP data table.

To this data we need add the Index column. @ron_horne used a different formula for his Index column but I like something a little more straightforward.

The formula above is:

If( :ID != Lag( :ID, 1 ), 1, Lag( :Index, 1 ) + 1)

In prose this says that if the ID value in the current row isn't the same as the ID in the previous row (i.e. Lag ( :ID, 1)) then put a 1 in the Index column, otherwise take the value of the Index column from the previous row (i.e. Lag (:Index, 1))

As you can see, the Index column in the data table now counts the rows within ID.

Now we can use Tables->Split to split your Var1, Var2 and Outcome columns by Index and group the rows by ID.

That will give you this data table, which I think is what you want. It has new columns for each row within ID and one row per ID.

It might be easier to see a simpler example where we don't have the create the Split By column. Here's a video that shows that.

Let me know if I've missed the mark or you're still confused.

-Jeff

View solution in original post

txnelson · Nov 17, 2016 10:15 AM

What you want to do is to use
Tables==>Split
You can Split By ID and specify var1,var2 and outcome as the Split Columns

Jim

jswislar · Nov 17, 2016 02:42 PM

Thanks, Jim. That partly worked. However, instead of getting one row for each sbject ID (n=15,000), I ended up with 14 rows of data and each variable name has the SubjectID appended to it.

My data are organized SubjectID. Within that, subjects can have multiple lines of data (treatment codes) for each visit. In long format, there are 1-13 lines of data per subject.

Should multiple "split by" variables be used? Should "group by" be used?

Thanks,

Joe

txnelson · Nov 17, 2016 03:07 PM

Oops.....I think you can solve the issue if you use the Group and place your ID column in there too

If your data are sorted by ID, (all of the data for a given ID are together). If not sorted, then go to Tables==>Sort, and order the data by ID. Then selecting it as your "Split By" column, and your Var1, Var2, Outcome as your "Split Columns" you will get a data table with one row per value of Split By column, and then you will get Var1 001, Var1 002......Var1 013, Var2 001, Var2 002.......Var2 013, Outcome 001, Outcome 002......Outcome 013.
Given the fact that you can have up to 13 levels.

Jim

jswislar · Nov 17, 2016 03:29 PM

I'm not sure what's happening then. The data are sorted by ID. When I split by ID and use my vars as "split columns", I end up with 14 lines of data and a column for each id/variable combination.

I'm trying to get:

ID visit1 visit 2 .... visit13 outcome1 outcome2 ... outcome 13

001 9/13 10/20 ... 3/2 0 0 ... 1

002 7/04 8/12 ... 12/5 0 1 ... 1

What I'm getting is

Date1-ID001 Date1-ID002

Line1 9/13 7/04

Line2 10/20 8/12

Thanks,

Joe

ron_horne · Nov 17, 2016 12:15 PM

Jim is right. Yet, if you are doing this just to produce summaries of the data per ID you can save this step by using Tables >>> summary.

best,

Ron

jswislar · Nov 17, 2016 02:44 PM

Thanks, Ron, but I'm not doing simple summaries. I'm working on a survival analysis. My first step is to get the data into wide format.
Joe

ron_horne · Nov 17, 2016 07:22 PM

Sorry for not double checking first.

Try the following

names default to here(1);

// source data table

dt = New Table( "source",

Add Rows( 10 ),

New Column( "ID ",

Numeric,

Continuous,

Format( "Best", 5 ),

Set Values( [1, 1, 1, 2, 2, 3, 3, 4, 4, 4] )

),

New Column( "Var1 ",

Numeric,

Continuous,

Format( "Best", 12 ),

Set Values( [1, 1, 1, 0, 0, 1, 1, 0, 1, 0] )

),

New Column( "Var2 ",

Numeric,

Continuous,

Format( "Best", 12 ),

Set Values( [1, 1, 0, 1, 1, 1, 0, 1, 1, 0] )

),

New Column( "Outcome",

Numeric,

Continuous,

Format( "Best", 12 ),

Set Values( [0, 0, 1, 1, 1, 0, 0, 1, 1, 1] )

)

);

// add index column for splitting - no need to sort the table first

dt << new column ("index", formula(Sum( :ID[Index( 1, Row() )] == :ID ) ) );

// now split by index and group by ID

dt << Split(

Split By( :index ),

Split( :Var1, :Var2, :Outcome ),

Group( :ID )

);

jswislar · Nov 18, 2016 02:27 PM

OK, I'm new to JMP and have never used JSL, although I've done plenty of programming in SPSS, SAS and a few others. I'm not following the scripting at all in this example, especially the "set values". I think the next block of commands is trying to create new index variables for grouping prior to splitting.

I created a few variables when going from wide to long in another dataset, but in that case, every subject had the same number of observations.

ron_horne · Nov 18, 2016 03:39 PM

if you ran the whole script as is and the result is what you are looking for then you can reproduce it on your data set with just a few clicks.

1) add a new column and call it "index".

2) set its' formula to the following: Sum( :ID[Index( 1, Row() )] == :ID )

3) Tabls >> Split and fill it as so.

click OK and you should be done.

Best,

Ron

Discussions

How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Re: How do I Transpose data from Long to Wide format?

Recommended Articles