Discussions

BHarris · Apr 22, 2025 01:14 PM

Suppose I have this table:

Category	Item	Level
A	1	delta
A	2	foxtrot
A	3	hotel
B	1	juliet
B	3	lima

Note that Category=B, Item=2 is missing.

If I try to "split" this table with "Split By" = Category and "Split Columns" = Level, and "Keep All", it returns this:

Item	A	B
1	delta	juliet
2	foxtrot	lima
3	hotel

Note that on the newly Split table, it shows that Item=2,Category=B is "lima", but that's incorrect -- that cell should be blank, and "lima" should be under Item=3,Category=B.

Is this user error, or is this a bug in JMP?

Chris_Kirchberg · Apr 22, 2025 01:45 PM

Hi @BHarris, try using Item in group. The result will be what you would expect.

Best,

Chris Kirchberg, M.S.²
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com

View solution in original post

Chris_Kirchberg · Apr 22, 2025 01:45 PM

Hi @BHarris, try using Item in group. The result will be what you would expect.

Best,

Chris Kirchberg, M.S.²
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com

BHarris · Apr 22, 2025 01:49 PM

Yes, that does seem to work.

Can you explain what's going on? I'd like to understand why the Group step was necessary.

Chris_Kirchberg · Apr 22, 2025 02:11 PM

Sure, Splitting without the group is doing it in a literal fashion by row and is not aware that Item has a missing component for category. Group forces level to separate out by Item for each category by taking a look at Item first then Category.

From online help (https://www.jmp.com/support/help/en/18.2/?os=mac&source=application#page/jmp/split-columns-in-data-t...:(

Group

Specifies a Group variable when you want your data to be split within each group of the selected variable. Each group results in a row in the output table. You must also specify the required variables, Split By, and Split Columns.

Note: If the variable that you want to group by contains unequal groups or is in a random order, specifying it as the Group variable ensures that your data is restructured properly, and any missing values are assigned in the appropriate places.

Otherwise you would have to put in an extra row and define Category and Item but leave Level blank for that row.

Hope that helps.

Chris Kirchberg, M.S.²
Data Scientist, Life Sciences - Global Technical Enablement
JMP Statistical Discovery, LLC. - Denver, CO
Tel: +1-919-531-9927 ▪ Mobile: +1-303-378-7419 ▪ E-mail: chris.kirchberg@jmp.com
www.jmp.com

BHarris · May 7, 2025 02:03 PM

The more I think about this, the less it makes sense. I don't understand why split would behave like that under any circumstance -- almost like the developers simply nested some loops and hoped that the data would fall in the right spots...

I'm left wondering if (a) I'm still too dumb to understand the underlying rationale for this behavior, (b) this was implemented this way to offer high performance (fast splits on large tables) where the inputs are full-factorial and the user is expected to know that assumption is being made, or (c) it's really just a bug that needs to be submitted/fixed.

(I won't be offended if it's (a). ;) )

hogi · May 7, 2025 05:01 PM

Can you elaborate on your thoughts?

I would argue in the opposite direction:
If the input data is well structured, you can Split, without taking any care.

If the input data is not well structured - but has some grouping information in addition, there is always the possibility to use this grouping information to tell JMP what it should do.

What doesn't work:
the user knows the grouping information (there is the column) but doesn't tell JMP (via the GUI).
Then it's no wonder that JMP doesn't use the grouping information.

[there are other cases where JMP applies some guesses | auto correction - and the user wonders about the creativity].

BHarris · May 7, 2025 06:03 PM

What's your take on the data set provided in the original question? Is it "well structured"?

And what's your take on JMPs current behavior when splitting that table?

My understanding is that "Split By" columns are those with values that you want to end up in the new table's column headers, and "Split Columns" are the columns whose values you want to end up in those new columns' cells. I still don't understand what "grouping" means in this context.

txnelson · May 7, 2025 06:16 PM

Your initial data table is "well structured".

The Grouping concept can be thought about in terms as to what rows in the input data table are to be grouped together to define a specific output row.

Jim

hogi · May 7, 2025 10:37 PM

Another way to explain:

Your columns "Category" and "Level" (alone) are not well structured - one needs some additional information like

@BHarris wrote:
Note that Category=B, Item=2 is missing.

to inform others about the fact: between row #5 and row #6 something is missing.

Case #1:
The user doesn't tell JMP about the existence of the column "Item" ("Item" is not used in the GUI **)
Then JMP doesn't have enough knowledge to do the job properly.

Case #2:
There is a 3rd drop zone. It allows the user to use

@BHarris wrote:
Note that Category=B, Item=2 is missing.

in a structured way - as a grouping column.

"Grouping" fits very well to what you want:
Provide JMP with additional information so that entries can be placed into specific rows, determined not by the Rank(row(), Categories) of the input data, but grouped via grouping columns.

**) edit:
"Item" is used in the GUI - implicitly via Keep All
but this is too "passive" - see below.

BHarris · May 9, 2025 05:46 PM

@hogi -- you've jarred something loose here, the idea that I wasn't including the Item column anywhere in the interface, and therefore there's an implication that maybe the Split algorithm isn't even really aware of its existence.

I think I was believing that *all* columns were important in the Split operation, and it was trying to maintain all relationships where possible, e.g. that "lima" was also only associated with item "3". Now I'm left trying to figure out if all of the other columns should be grouping columns if I'm doing "Keep all"...

Someday I hope to achieve the status of JMP-Master, and be fully enlightened, and perhaps then it will make sense why JMP doesn't implicitly use all other "Kept" columns in the Group role. Until then, it will likely continue to make me nervous, as its behavior for me in these conditions isn't obvious, results in faulty output data under conditions that may be hard to identify, and is not well-enough documented for me to understand. At least that nervousness will keep me more alert in the future when using it.

Thx.

Discussions

Bad Splitting?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Re: Bad Stacking?

Recommended Articles