Solved: Re: Split Data Table Problem

shoffmeister · Oct 18, 2016 6:14 PM

I'm having an issue with the splitting of a data table and was wondering if someone knows how to it. My data looks like this:

The data should be Split by ID2 with ID being the group variable. When I am doing it that way I end up with this result:

The Problem is that for IDs 2 and 4 there are multiple values of C in column ID2. As it seems JMP is always using the last value in the splitted data table.

Is there any automatic way to do it so that I end up with a table like the following?

Here there are two rows for IDs 2 and 4 representing the different values of C in the original data.

Thanks for your help!

Sebastian

msharp · Nov 18, 2015 12:31 PM

Just combining/summarizing what was mentioned. Make sure you sort the data before hand or the Lag() function will have issues. Also, using a formula will be faster than a for each row() command. If you are worried about post sorting that may affect the lag() function later on, just delete the formula to avoid conflicts. Thus:

NamesDefaultToHere(1);

// Table

dt1 = New Table( "Sebastian",

New Column( "ID",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 2, 2] )

),

New Column( "ID2",

Character,

"Nominal",

Set Values(

{"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C", "C", "B"}

)

),

New Column( "Value",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values(

[111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

555, 666, 777, 888]

)

);

// the table needs to be sorted for the formula in the next step to work

dt1 << Sort(Replace table,

By( :ID, :ID2 ),

Order( Ascending, Ascending )

);

// Add an auxiliary column

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Formula( If(:ID2 != Lag(:ID2, 1) | Row() == 1, 1, Lag(:Repeat, 1) + 1) )

);

:Repeat << Delete Formula; // faster than 'for each row()'

// Split

dt2 = dt1 << Split(

Split By( :ID2 ),

Split( :Value ),

Group( :ID, :Repeat )

);

// Delete aauxiliary column

dt2 << deleteColumn(:Repeat);

View solution in original post

ian_jmp · Nov 18, 2015 04:36 AM

I think you need an auxiliary column, Sebastian.

NamesDefaultToHere(1);

// Table

dt1 = New Table( "Sebastian",

Add Rows( 16 ),

New Column( "ID",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] )

),

New Column( "ID2",

Character,

"Nominal",

Set Values(

{"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C"}

)

),

New Column( "Value",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values(

[111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

555, 666]

)

);

// Add an auxiliary column

Wait(3);

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2] )

);

// Split

dt2 = dt1 << Split(

Split By( :ID2 ),

Split( :Value ),

Group( :ID, :Repeat )

);

// Delete auxilliary column

Wait(3);

dt2 << deleteColumn(:Repeat);

shoffmeister · Nov 18, 2015 05:42 AM

Thanks for the support Ian!

That was more or less what I was doing. The problem in the real world is of course that the data set has ~3k rows and I could not imagine a formula to create the repeat-column automatically rather than manually.

Do you know a solution for that as well? I would like to avoid writing a script that nests multiple for-loops to generate the "repeat"-column. That sounds so inefficient.

Thanks,

Sebastian

ian_jmp · Nov 18, 2015 06:01 AM

Oops! Sorry, yes Sebastian. Maybe something like?

// Add an auxiliary column

Wait( 3 );

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Formula( If(:ID2 != Lag(:ID2, 1) | Row() == 1, 1, Lag(:Repeat, 1) + 1) )

);

ron_horne · Nov 18, 2015 06:27 AM

hi sebastianhoffmeister,

try this:

dt = New Table( "Original Data",

Add Rows( 16 ),

New Column( "ID",

Numeric,

Continuous,

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] )

),

New Column( "ID2",

Character,

Nominal,

Set Values(

{"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C"}

)

),

New Column( "Value",

Numeric,

Continuous,

Format( "Best", 12 ),

Set Values(

[111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

555, 666]

)

);

// the table needs to be sorted for the formula in the next step to work

dt << Sort(Replace table,

By( :ID, :ID2 ),

Order( Ascending, Ascending )

);

// introduce an new index that will recognize the duplications

dt << new column ("ID3");

for each row (If( // do not use a formula in the column since it is dependent on sorting that may change later.

Row() == 1, ID3 =:ID,

:ID != Lag( :ID, 1 ) | :ID2 != Lag( :ID2, 1 ),ID3 = :ID,

ID3 = Lag( :ID3, 1 ) + 1

));

// now you can split the table

dtsorted = dt << Split(

Split By( :ID2 ),

Split( :Value ),

Group( :ID3, :ID ),

Output Table( "split table" )

);

best,

ron

msharp · Nov 18, 2015 12:31 PM

Just combining/summarizing what was mentioned. Make sure you sort the data before hand or the Lag() function will have issues. Also, using a formula will be faster than a for each row() command. If you are worried about post sorting that may affect the lag() function later on, just delete the formula to avoid conflicts. Thus:

NamesDefaultToHere(1);

// Table

dt1 = New Table( "Sebastian",

New Column( "ID",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values( [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 2, 2] )

),

New Column( "ID2",

Character,

"Nominal",

Set Values(

{"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C", "C", "B"}

)

),

New Column( "Value",

Numeric,

"Nominal",

Format( "Best", 12 ),

Set Values(

[111, 222, 333, 444, 111, 222, 333, 555, 111, 222, 555, 444, 111, 222,

555, 666, 777, 888]

)

);

// the table needs to be sorted for the formula in the next step to work

dt1 << Sort(Replace table,

By( :ID, :ID2 ),

Order( Ascending, Ascending )

);

// Add an auxiliary column

dt1 << New Column( "Repeat",

Numeric,

"Nominal",

Format( "Best", 12 ),

Formula( If(:ID2 != Lag(:ID2, 1) | Row() == 1, 1, Lag(:Repeat, 1) + 1) )

);

:Repeat << Delete Formula; // faster than 'for each row()'

// Split

dt2 = dt1 << Split(

Split By( :ID2 ),

Split( :Value ),

Group( :ID, :Repeat )

);

// Delete aauxiliary column

dt2 << deleteColumn(:Repeat);

ron_horne · Oct 28, 2016 6:22 AM

thanks msharp,

by the way, in a previous post @MS offerd this formula for indexing :Sum( :ID[Index( 1, Row() )] == :ID )@ms offerd this formula for indexing :Sum( :ID[Index( 1, Row() )] == :ID )

Re: Syntax: How do I create an index column by ID

is there a way for using it in this context? with two index columns and one of them is alphabetic?

best,

ron

msharp · Nov 18, 2015 04:19 PM

I would stay away from this approach in general, static data is generally safer than dynamic data. You have to ask yourself how important is the order of the data? This solution will keep the keys in order, but the values are free to roam. BUT...

That solution will only work with numeric columns. If we were to replace :ID2 with :ID2Num where A's = 1, B's =2 ect. you could simply use an & statement, thus :Sum( :ID[Index( 1, Row() )] == :ID & :ID2Num[Index( 1, Row() )] == :ID2Num ) for the formula.

If you have character columns you'll have to be more creative. Thus for the formula:

length(concat items(:ID2[Index( 1, Row() )],"")) - length(substitute(concat items(:ID2[Index( 1, Row() )],""), :ID2,""))

or

dt1 << New Column("Concat", Character, Nominal, Formula(:ID2 || char(:ID)));

dt1 << New Column("Repeat", Character, Nominal, Formula((length(concat items(:Concat[Index( 1, Row() )],"")) - length(substitute(concat items(:Concat[Index( 1, Row() )],""), :Concat,"")))/2)); //the 2 at the end accounts for the length of the concatenated string

There's a lot going on here, and I don't have the energy to explain it now. But let me know if you have questions; really, there's a more viable simpler solution above.

ron_horne · Nov 18, 2015 06:01 PM

thanks msharp,

it works great.

and you are write, keeping things simple is probably the best way to go.