Subscribe Bookmark RSS Feed

Split Data Table Problem

shoffmeister

Community Trekker

Joined:

Mar 27, 2015

I'm having an issue with the splitting of a data table and was wondering if someone knows how to it. My data looks like this:

10505_Data Table.PNG

The data should be Split by ID2 ​with ID being  the group variable. When I am doing it that way I end up with this result:

10506_Splitted.PNG

The Problem is that for IDs 2 and 4 there are multiple values of C in column ID2. As it seems JMP is always using the last value in the splitted data table.

Is there any automatic way to do it so that I end up with a table like the following?

10507_correct.PNG

Here there are two rows for IDs 2 and 4 representing the different values of C in the original data.

Thanks for your help!

Sebastian

1 ACCEPTED SOLUTION

Accepted Solutions
Solution

Just combining/summarizing what was mentioned. Make sure you sort the data before hand or the Lag() function will have issues.  Also, using a formula will be faster than a for each row() command.  If you are worried about post sorting that may affect the lag() function later on, just delete the formula to avoid conflicts.  Thus:

NamesDefaultToHere(1);

// Table

dt1 = New Table( "Sebastian",

New Column( "ID",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 2, 2] )

),

New Column( "ID2",

Character,

"Nominal",

Set Values(

{"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C", "C", "B"}

)

),

New Column( "Value",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values(

[111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

555, 666, 777, 888]

)

)

);

// the table needs to be sorted for the formula in the next step to work

dt1 << Sort(Replace table,

       By( :ID, :ID2 ),

       Order( Ascending, Ascending )

);

// Add an auxiliary column

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Formula( If(:ID2 != Lag(:ID2, 1) | Row() == 1, 1, Lag(:Repeat, 1) + 1) )

);

:Repeat << Delete Formula; // faster than 'for each row()'

// Split

dt2 = dt1 << Split(

Split By( :ID2 ),

Split( :Value ),

Group( :ID, :Repeat )

);

// Delete aauxiliary column

dt2 << deleteColumn(:Repeat);

8 REPLIES
ian_jmp

Staff

Joined:

Jun 23, 2011

I think you need an auxiliary column, Sebastian.

NamesDefaultToHere(1);

// Table

dt1 = New Table( "Sebastian",

Add Rows( 16 ),

New Column( "ID",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] )

),

New Column( "ID2",

Character,

"Nominal",

Set Values(

{"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C"}

)

),

New Column( "Value",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values(

[111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

555, 666]

)

)

);

// Add an auxiliary column

Wait(3);

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2] )

);

// Split

dt2 = dt1 << Split(

Split By( :ID2 ),

Split( :Value ),

Group( :ID, :Repeat )

);

// Delete auxilliary column

Wait(3);

dt2 << deleteColumn(:Repeat);

shoffmeister

Community Trekker

Joined:

Mar 27, 2015

Thanks for the support Ian!

That was more or less what I was doing. The problem in the real world is of course that the data set has ~3k rows and I could not imagine a formula to create the repeat-column automatically rather than manually.

Do you know a solution for that as well? I would like to avoid writing a script that nests multiple for-loops to generate the "repeat"-column. That sounds so inefficient.

Thanks,

Sebastian

ian_jmp

Staff

Joined:

Jun 23, 2011

Oops! Sorry, yes Sebastian. Maybe something like?

// Add an auxiliary column

Wait( 3 );

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Formula( If(:ID2 != Lag(:ID2, 1) | Row() == 1, 1, Lag(:Repeat, 1) + 1) )

);



ron_horne

Super User

Joined:

Jun 23, 2011

hi sebastianhoffmeister​,

try this:

dt = New Table( "Original Data",

     Add Rows( 16 ),

     New Column( "ID",

          Numeric,

          Continuous,

          Format( "Best", 12 ),

          Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] )

     ),

     New Column( "ID2",

          Character,

          Nominal,

          Set Values(

              {"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

              "C", "C"}

          )

     ),

     New Column( "Value",

          Numeric,

          Continuous,

          Format( "Best", 12 ),

          Set Values(

              [111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

              555, 666]

          )

     )

);

// the table needs to be sorted for the formula in the next step to work

dt << Sort(Replace table,

     By( :ID, :ID2 ),

     Order( Ascending, Ascending )

);

// introduce an new index that will recognize the duplications

dt << new column ("ID3");

for each row (If(             // do not use a formula in the column since it is dependent on sorting that may change later.

     Row() == 1, ID3 =:ID,

     :ID != Lag( :ID, 1 ) | :ID2 != Lag( :ID2, 1 ),ID3 = :ID,

     ID3 = Lag( :ID3, 1 ) + 1

));

// now you can split the table

dtsorted = dt << Split(

     Split By( :ID2 ),

     Split( :Value ),

     Group( :ID3, :ID ),

     Output Table( "split table" )

);

best,

ron

Solution

Just combining/summarizing what was mentioned. Make sure you sort the data before hand or the Lag() function will have issues.  Also, using a formula will be faster than a for each row() command.  If you are worried about post sorting that may affect the lag() function later on, just delete the formula to avoid conflicts.  Thus:

NamesDefaultToHere(1);

// Table

dt1 = New Table( "Sebastian",

New Column( "ID",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 2, 2] )

),

New Column( "ID2",

Character,

"Nominal",

Set Values(

{"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C", "C", "B"}

)

),

New Column( "Value",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values(

[111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

555, 666, 777, 888]

)

)

);

// the table needs to be sorted for the formula in the next step to work

dt1 << Sort(Replace table,

       By( :ID, :ID2 ),

       Order( Ascending, Ascending )

);

// Add an auxiliary column

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Formula( If(:ID2 != Lag(:ID2, 1) | Row() == 1, 1, Lag(:Repeat, 1) + 1) )

);

:Repeat << Delete Formula; // faster than 'for each row()'

// Split

dt2 = dt1 << Split(

Split By( :ID2 ),

Split( :Value ),

Group( :ID, :Repeat )

);

// Delete aauxiliary column

dt2 << deleteColumn(:Repeat);

ron_horne

Super User

Joined:

Jun 23, 2011

thanks msharp​,

by the way, in a previous post @MS offerd this formula for indexing      :Sum( :ID[Index( 1, Row() )] == :ID )@ms offerd this formula for indexing      :Sum( :ID[Index( 1, Row() )] == :ID )

Re: Syntax: How do I create an index column by ID

is there a way for using it in this context? with two index columns and one of them is alphabetic?

best,

ron

msharp

Super User

Joined:

Jul 28, 2015

I would stay away from this approach in general, static data is generally safer than dynamic data. You have to ask yourself how important is the order of the data?  This solution will keep the keys in order, but the values are free to roam.  BUT...

That solution will only work with numeric columns.  If we were to replace :ID2 with :ID2Num where A's = 1, B's =2 ect. you could simply use an & statement, thus :Sum( :ID[Index( 1, Row() )] == :ID & :ID2Num[Index( 1, Row() )] == :ID2Num ) for the formula.

If you have character columns you'll have to be more creative.  Thus for the formula:

length(concat items(:ID2[Index( 1, Row() )],"")) - length(substitute(concat items(:ID2[Index( 1, Row() )],""), :ID2,""))

or

dt1 << New Column("Concat", Character, Nominal, Formula(:ID2 || char(:ID)));

dt1 << New Column("Repeat", Character, Nominal, Formula((length(concat items(:Concat[Index( 1, Row() )],"")) - length(substitute(concat items(:Concat[Index( 1, Row() )],""), :Concat,"")))/2)); //the 2 at the end accounts for the length of the concatenated string

There's a lot going on here, and I don't have the energy to explain it now.  But let me know if you have questions; really, there's a more viable simpler solution above.

ron_horne

Super User

Joined:

Jun 23, 2011

thanks msharp​,

it works great.

and you are write, keeping things simple is probably the best way to go.