Subscribe Bookmark RSS Feed

How do I Transpose data from Long to Wide format?

jswislar

Occasional Contributor

Joined:

Nov 17, 2016

I'm a new JMP user.

My data is currently in long format. In other words, subjects have multiple lines of data each. However, I need the data to be in wide format for survival analysis - 1 line of data per subject. When I try to use Transpose, JMP tells me "Cannot transpose. Data columns have different data types". The variables are a mix of continiuous and nominal. So, how am I supposed to do this? I was able to convert a different data set from wide to long without difficulty, but can't seem to do the reverse.

 

Current data is set up like this (fake data):

 

ID     Var1     Var2    Outcome

001   1          1          0

001   1          1          0

001   1          0          1

002   0          1           0

002   0          1           1

003   1          1            0

 

I want it like this:

ID     Var1_time1  Var2_time1 Var1_time2 Var2_time2.... Outcome_timex

001   1                  1                  1                  1                     1

002   0                  1                  0                  1                      1

003   1                  1                  .                    .                      0       

 

Thanks!                   

1 ACCEPTED SOLUTION

Accepted Solutions
Jeff_Perkinson

Community Manager

Joined:

Jun 23, 2011

Solution

@ron_horne has the right solution but perhaps some pictures will help.

 

Tables -> Split will rearrange the data the way you want but it requires a column that identifies the new columns. In your case you want to create columns for each row within ID so you need a column which counts rows within ID. @ron_horne called this column Index.

 

You'll also need to make sure that your data is sorted by ID for this to work.

 

Here's the data that you presented in your first post in the form of a JMP data table.

 

JMPScreenSnapz058.png

 

To this data we need add the Index column. @ron_horne used a different formula for his Index column but I like something a little more straightforward. 

JMPScreenSnapz059.png

The formula above is:

 

If( :ID != Lag( :ID, 1 ), 1, Lag( :Index, 1 ) + 1)

 

In prose this says that if the ID value in the current row isn't the same as the ID in the previous row (i.e. Lag ( :ID, 1)) then put a 1 in the Index column, otherwise take the value of the Index column from the previous row (i.e. Lag (:Index, 1))

 

As you can see, the Index column in the data table now counts the rows within ID.

JMPScreenSnapz060.png

Now we can use Tables->Split to split your Var1, Var2 and Outcome columns by Index and group the rows by ID.

 

JMPScreenSnapz061.png

That will give you this data table, which I think is what you want. It has new columns for each row within ID and one row per ID.

JMPScreenSnapz062.png

It might be easier to see a simpler example where we don't have the create the Split By column. Here's a video that shows that.

 

 

Let me know if I've missed the mark or you're still confused.

 

-Jeff
13 REPLIES
txnelson

Super User

Joined:

Jun 22, 2012

What you want to do is to use
Tables==>Split
You can Split By ID and specify var1,var2 and outcome as the Split Columns
Jim
jswislar

Occasional Contributor

Joined:

Nov 17, 2016

Thanks, Jim. That partly worked. However, instead of getting one row for each sbject ID (n=15,000), I ended up with 14 rows of data and each variable name has the SubjectID appended to it.

 

My data are organized SubjectID. Within that, subjects can have multiple lines of data (treatment codes) for each visit. In long format, there are 1-13 lines of data per subject.

 

Should multiple "split by" variables be used? Should "group by" be used?

Thanks,

Joe

txnelson

Super User

Joined:

Jun 22, 2012

 

Oops.....I think you can solve the issue if you use the Group and place your ID column in there too

 

If your data are sorted by ID, (all of the data for a given ID are together).  If not sorted, then go to Tables==>Sort, and order the data by ID.  Then selecting it as your "Split By" column, and your Var1, Var2, Outcome as your "Split Columns" you will get a data table with one row per value of Split By column, and then you will get Var1 001, Var1 002......Var1 013, Var2 001, Var2 002.......Var2 013, Outcome 001, Outcome 002......Outcome 013.
Given the fact that you can have up to 13 levels.

Jim
jswislar

Occasional Contributor

Joined:

Nov 17, 2016

I'm not sure what's happening then. The data are sorted by ID. When I split by ID and use my vars as "split columns", I end up with 14 lines of data and a column for each id/variable combination.

 

I'm trying to get:

ID   visit1 visit 2 .... visit13 outcome1  outcome2 ... outcome 13

001 9/13  10/20 ... 3/2        0               0               ...  1

002 7/04  8/12 ...   12/5        0               1               ...  1

 

What I'm getting is

           Date1-ID001   Date1-ID002

Line1   9/13                 7/04

Line2   10/20                8/12

 

Thanks,

Joe

ron_horne

Super User

Joined:

Jun 23, 2011

Jim is right. Yet, if you are doing this just to produce summaries of the data per ID you can save this step by using Tables >>> summary.

 

best,

Ron

 

jswislar

Occasional Contributor

Joined:

Nov 17, 2016

Thanks, Ron, but I'm not doing simple summaries. I'm working on a survival analysis. My first step is to get the data into wide format.
Joe
ron_horne

Super User

Joined:

Jun 23, 2011

Sorry for not double checking first.

Try the following

 

names default to here(1);

// source data table

dt = New Table( "source",

     Add Rows( 10 ),

     New Column( "ID   ",

           Numeric,

           Continuous,

           Format( "Best", 5 ),

           Set Values( [1, 1, 1, 2, 2, 3, 3, 4, 4, 4] )

     ),

     New Column( "Var1   ",

           Numeric,

           Continuous,

           Format( "Best", 12 ),

           Set Values( [1, 1, 1, 0, 0, 1, 1, 0, 1, 0] )

     ),

     New Column( "Var2   ",

           Numeric,

           Continuous,

           Format( "Best", 12 ),

           Set Values( [1, 1, 0, 1, 1, 1, 0, 1, 1, 0] )

     ),

     New Column( "Outcome",

           Numeric,

           Continuous,

           Format( "Best", 12 ),

           Set Values( [0, 0, 1, 1, 1, 0, 0, 1, 1, 1] )

     )

);

 

 

// add index column for splitting - no need to sort the table first

dt << new column ("index", formula(Sum( :ID[Index( 1, Row() )] == :ID ) ) );

 

// now split by index and group by ID

dt << Split(

     Split By( :index ),

     Split( :Var1, :Var2, :Outcome ),

     Group( :ID )

);

 

 

 

 

 

 

 

 

 

jswislar

Occasional Contributor

Joined:

Nov 17, 2016

OK, I'm new to JMP and have never used JSL, although I've done plenty of programming in SPSS, SAS and a few others. I'm not following the scripting at all in this example, especially the "set values". I think the next block of commands is trying to create new index variables for grouping prior to splitting.

 

I created a few variables when going from wide to long in another dataset, but in that case, every subject had the same number of observations.

ron_horne

Super User

Joined:

Jun 23, 2011

if you ran the whole script as is and the result is what you are looking for then you can reproduce it on your data set with just a few clicks.

1) add a new column and call it "index".

2) set its' formula to the following: Sum( :ID[Index( 1, Row() )] == :ID )

3) Tabls >> Split and fill it as so.

image.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

click OK and you should be done.

 

Best,

Ron