turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Split Data Table Problem

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 18, 2015 1:05 AM
(5334 views)

I'm having an issue with the splitting of a data table and was wondering if someone knows how to it. My data looks like this:

The data should be Split by **ID2 **with **ID** being the group variable. When I am doing it that way I end up with this result:

The Problem is that for **ID**s 2 and 4 there are multiple values of C in column **ID2**. As it seems JMP is always using the last value in the splitted data table.

Is there any automatic way to do it so that I end up with a table like the following?

Here there are two rows for IDs 2 and 4 representing the different values of C in the original data.

Thanks for your help!

Sebastian

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 18, 2015 9:31 AM
(8996 views)
| Posted in reply to message from shoffmeister 11/18/2015 05:42 AM

Just combining/summarizing what was mentioned. Make sure you sort the data before hand or the Lag() function will have issues. Also, using a formula will be faster than a for each row() command. If you are worried about post sorting that may affect the lag() function later on, just delete the formula to avoid conflicts. Thus:

NamesDefaultToHere**(1)**;

// Table

dt1 = New Table**(** "Sebastian",

New Column**(** "ID",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Set Values**(** **[****1**, **1**, **1**, **1**, **2**, **2**, **2**, **2**, **3**, **3**, **3**, **3**, **4**, **4**, **4**, **4**, **2**, **2]** **)**

**)**,

New Column**(** "ID2",

Character,

"Nominal",

Set Values**(**

**{**"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C", "C", "B"**}**

**)**

**)**,

New Column**(** "Value",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Set Values**(**

**[****111**, **222**, **333**, **444**, **111**, **222**, **333**, **555**, **111**, **222**, **555**, **444**, **111**, **222**,

**555**, **666**, **777**, **888]**

**)**

**)**

**)**;

// the table needs to be sorted for the formula in the next step to work

dt1 << **Sort(**Replace table,

By**(** :ID, :ID2 **)**,

Order**(** Ascending, Ascending **)**

**)**;

// Add an auxiliary column

dt1 << **New Column(** "Repeat",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Formula**(** If**(**:ID2 != Lag**(**:ID2, **1)** | Row**()** == **1**, **1**, Lag**(**:Repeat, **1)** + **1)** **)**

**)**;

:Repeat << **Delete Formula**; // faster than 'for each row()'

// Split

dt2 = dt1 << **Split(**

Split By**(** :ID2 **)**,

Split**(** :Value **)**,

Group**(** :ID, :Repeat **)**

**)**;

// Delete aauxiliary column

dt2 << **deleteColumn(**:Repeat**)**;

8 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 18, 2015 1:36 AM
(5017 views)
| Posted in reply to message from shoffmeister 11/18/2015 04:05 AM

I think you need an auxiliary column, Sebastian.

NamesDefaultToHere**(****1****)**;

// Table

dt1 = New Table**(** "Sebastian",

Add Rows**(** **16** **)**,

New Column**(** "ID",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Set Values**(** **[****1**, **1**, **1**, **1**, **2**, **2**, **2**, **2**, **3**, **3**, **3**, **3**, **4**, **4**, **4**, **4****]** **)**

**)**,

New Column**(** "ID2",

Character,

"Nominal",

Set Values**(**

**{**"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C"**}**

**)**

**)**,

New Column**(** "Value",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Set Values**(**

**[****111**, **222**, **333**, **444**, **111**, **222**, **333**, **555**, **111**, **222**, **555**, **444**, **111**, **222**,

**555**, **666****]**

**)**

**)**

**)**;

// Add an auxiliary column

Wait**(****3****)**;

dt1 << **New Column****(** "Repeat",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Set Values**(** **[****1**, **1**, **1**, **1**, **1**, **1**, **1**, **2**, **1**, **1**, **1**, **1**, **1**, **1**, **1**, **2****]** **)**

**)**;

// Split

dt2 = dt1 << **Split****(**

Split By**(** :ID2 **)**,

Split**(** :Value **)**,

Group**(** :ID, :Repeat **)**

**)**;

// Delete auxilliary column

Wait**(****3****)**;

dt2 << **deleteColumn****(**:Repeat**)**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Thanks for the support Ian!

That was more or less what I was doing. The problem in the real world is of course that the data set has ~3k rows and I could not imagine a formula to create the repeat-column automatically rather than manually.

Do you know a solution for that as well? I would like to avoid writing a script that nests multiple for-loops to generate the "repeat"-column. That sounds so inefficient.

Thanks,

Sebastian

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 18, 2015 3:01 AM
(5017 views)
| Posted in reply to message from shoffmeister 11/18/2015 05:42 AM

Oops! Sorry, yes Sebastian. Maybe something like?

// Add an auxiliary column

Wait**(** **3** **)**;

dt1 << **New Column****(** "Repeat",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Formula**(** If**(**:ID2 != Lag**(**:ID2, **1****)** | Row**()** == **1**, **1**, Lag**(**:Repeat, **1****)** + **1****)** **)**

**)**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 18, 2015 3:27 AM
(5017 views)
| Posted in reply to message from shoffmeister 11/18/2015 05:42 AM

hi sebastianhoffmeister,

try this:

dt = New Table**(** "Original Data",

Add Rows**(** **16** **)**,

New Column**(** "ID",

Numeric,

Continuous,

Format**(** "Best", **12** **)**,

Set Values**(** **[****1**, **1**, **1**, **1**, **2**, **2**, **2**, **2**, **3**, **3**, **3**, **3**, **4**, **4**, **4**, **4]** **)**

**)**,

New Column**(** "ID2",

Character,

Nominal,

Set Values**(**

**{**"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C"**}**

**)**

**)**,

New Column**(** "Value",

Numeric,

Continuous,

Format**(** "Best", **12** **)**,

Set Values**(**

**[****111**, **222**, **333**, **444**, **111**, **222**, **333**, **555**, **111**, **222**, **555**, **444**, **111**, **222**,

**555**, **666]**

**)**

**)**

**)**;

// the table needs to be sorted for the formula in the next step to work

dt << **Sort(**Replace table,

By**(** :ID, :ID2 **)**,

Order**(** Ascending, Ascending **)**

**)**;

// introduce an new index that will recognize the duplications

dt << **new column** **(**"ID3"**)**;

for each row **(**If**(** // do not use a formula in the column since it is dependent on sorting that may change later.

Row**()** == **1**, ID3 =:ID,

:ID != Lag**(** :ID, **1** **)** | :ID2 != Lag**(** :ID2, **1** **)**,ID3 = :ID,

ID3 = Lag**(** :ID3, **1** **)** + **1**

**))**;

// now you can split the table

dtsorted = dt << **Split(**

Split By**(** :ID2 **)**,

Split**(** :Value **)**,

Group**(** :ID3, :ID **)**,

Output Table**(** "split table" **)**

**)**;

best,

ron

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Nov 18, 2015 9:31 AM
(8997 views)
| Posted in reply to message from shoffmeister 11/18/2015 05:42 AM

Just combining/summarizing what was mentioned. Make sure you sort the data before hand or the Lag() function will have issues. Also, using a formula will be faster than a for each row() command. If you are worried about post sorting that may affect the lag() function later on, just delete the formula to avoid conflicts. Thus:

NamesDefaultToHere**(1)**;

// Table

dt1 = New Table**(** "Sebastian",

New Column**(** "ID",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Set Values**(** **[****1**, **1**, **1**, **1**, **2**, **2**, **2**, **2**, **3**, **3**, **3**, **3**, **4**, **4**, **4**, **4**, **2**, **2]** **)**

**)**,

New Column**(** "ID2",

Character,

"Nominal",

Set Values**(**

**{**"A", "B", "C", "D", "A", "B", "C", "C", "A", "B", "C", "D", "A", "B",

"C", "C", "C", "B"**}**

**)**

**)**,

New Column**(** "Value",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Set Values**(**

**[****111**, **222**, **333**, **444**, **111**, **222**, **333**, **555**, **111**, **222**, **555**, **444**, **111**, **222**,

**555**, **666**, **777**, **888]**

**)**

**)**

**)**;

// the table needs to be sorted for the formula in the next step to work

dt1 << **Sort(**Replace table,

By**(** :ID, :ID2 **)**,

Order**(** Ascending, Ascending **)**

**)**;

// Add an auxiliary column

dt1 << **New Column(** "Repeat",

Numeric,

"Nominal",

Format**(** "Best", **12** **)**,

Formula**(** If**(**:ID2 != Lag**(**:ID2, **1)** | Row**()** == **1**, **1**, Lag**(**:Repeat, **1)** + **1)** **)**

**)**;

:Repeat << **Delete Formula**; // faster than 'for each row()'

// Split

dt2 = dt1 << **Split(**

Split By**(** :ID2 **)**,

Split**(** :Value **)**,

Group**(** :ID, :Repeat **)**

**)**;

// Delete aauxiliary column

dt2 << **deleteColumn(**:Repeat**)**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

thanks msharp,

by the way, in a previous post @MS offerd this formula for indexing :Sum( :ID[Index( 1, Row() )] == :ID )@ms offerd this formula for indexing :Sum( :ID[Index( 1, Row() )] == :ID )

Re: Syntax: How do I create an index column by ID

is there a way for using it in this context? with two index columns and one of them is alphabetic?

best,

ron

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I would stay away from this approach in general, static data is generally safer than dynamic data. You have to ask yourself how important is the order of the data? This solution will keep the keys in order, but the values are free to roam. BUT...

That solution will only work with numeric columns. If we were to replace :ID2 with :ID2Num where A's = 1, B's =2 ect. you could simply use an & statement, thus :Sum( :ID[Index( 1, Row() )] == :ID & :ID2Num[Index( 1, Row() )] == :ID2Num ) for the formula.

If you have character columns you'll have to be more creative. Thus for the formula:

length(concat items(:ID2[Index( 1, Row() )],"")) - length(substitute(concat items(:ID2[Index( 1, Row() )],""), :ID2,""))

or

dt1 << New Column("Concat", Character, Nominal, Formula(:ID2 || char(:ID)));

dt1 << New Column("Repeat", Character, Nominal, Formula((length(concat items(:Concat[Index( 1, Row() )],"")) - length(substitute(concat items(:Concat[Index( 1, Row() )],""), :Concat,"")))/2)); //the 2 at the end accounts for the length of the concatenated string

There's a lot going on here, and I don't have the energy to explain it now. But let me know if you have questions; really, there's a more viable simpler solution above.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

thanks msharp,

it works great.

and you are write, keeping things simple is probably the best way to go.