cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Check out the JMPĀ® Marketplace featured Capability Explorer add-in
Choose Language Hide Translation Bar
Earendur
Level I

fix for two values in a single cell

What is the best way to fix having two values in the same cell?

Each row is assigned to a specific patient and the columns are genes with possible mutations. The problem is a couple patients have more than one mutation in the same gene, see the attached image. How do I best tell JMP that these are two separate values? I've tried making two rows with the same patient ID but I'm afraid that will cause the program to count other mutations twice. Thank you.

Screen Shot 2021-02-18 at 1.27.19 PM.png

9 REPLIES 9
SDF1
Super User

Re: fix for two values in a single cell

Hi @Earendur ,

 

  You should probably create two columns named something like Mutation 1 and Mutation 2 or Mutation N if there are N such mutations, then you can use a column formula with the formula using

Word(n, s, delim)

Where n would be the first word for the first mutation and so on, and s is the existing mutation column. The delim that you'd use is ", " since the mutations are separated by a comma and then a space. That should get you what you want.

DiedrichSchmidt_0-1613677240070.png

 

 

Hope this helps!,

DS

 

Edit: I should also add that the other rows where there's only one mutation will show up only in the first Word(1, s) column and the second in the second and so on. There will be empty rows for Mutation 2 column if there's only one mutation in the gene.

Earendur
Level I

Re: fix for two values in a single cell

Thanks @SDF1 , that makes sense.

If I make two columns, how do I tell JMP that I want it to consider Mut1...MutN together when doing analyses?

SDF1
Super User

Re: fix for two values in a single cell

Hi @Earendur ,

 

  Are you using standard JMP -- I can't give any advice about JMP Clinical or Genomics, just JMP and JMP Pro with regard to analysis.

 

  But, you can do a standard analysis, see the made up table I created and the Fit Least Squares script. In general, you would cast the mutation columns as X factors and whatever your Y response is and run the platform. Depending on what you're doing, you could do an ANOVA or logistic fit or contingency, etc. You could even do a Scatterplot Matrix (Graph > Scatterplot) with character and numeric columns for visualization purposes, etc.

 

Hope this helps!,

DS

Earendur
Level I

Re: fix for two values in a single cell

@SDF1 

I am using standard JMP. 

 

Thanks for the sample table. The "Fit Group" you made is very similar to what I am trying to achieve, I just need all mutation variants "A-Z" to show up on the same graph, not in two separate graphs.

When I do that with the "Mut" column it thinks "A, B" and "A, D" etc. are single mutations and doesn't assign them into the individual "A" "B" and "D" categories along the x-axis.

SDF1
Super User

Re: fix for two values in a single cell

Hi @Earendur ,

 

  If you use the Mut 1 and Mut 2 columns, JMP will never put them in the same graph for the ANOVA analysis. You could however graph them together in Graph Builder. It'll group them, but nest the X-axis in a nice graphical way, but doesn't really do much for a proper analysis that you might be after.

DiedrichSchmidt_0-1613680301681.png

  JMP treats the "A, B", "A, D", and "G, I" mutations as a single "factor" if you do a one-way with the Mut and Y columns. If certain mutations always occur as specific pairs (like with DNA pairs), then you would not want to split out the mutations into separate columns (I'd think), as the unique pairing is what is special, not each one individually so much.

 

  JMP is doing the right thing in each case. A one-way/ANOVA might not be the right analysis you're after either -- you might be better off with a logistic analysis or doing a least squares fit to the data. (I added more rows by copy/pasting the Mut column at the end of the table.)

DiedrichSchmidt_1-1613681080996.png

 

  A lot of this comes back to what @ron_horne was saying about how to structure your data for the right kind of analysis that you are trying to do.

 

Hope this helps!,

DS

ron_horne
Super User (Alumni)

Re: fix for two values in a single cell

Hi @Earendur,

Perhaps someone with more relevant experience in the specific field can give you a more concrete answer.

The way i think of thinks like this when i come across such situations is first i need to estimate the scale of the situation. How many cases have two mutations? are there any of three or more? are there many different combinations? or perhaps a specific one is typical for pairs?

You can use the text explorer on the mutation column to see the distribution of all individual and combinations.

Once done with that, you need to make sure to define the unit of analysis correctly and whether the observations are independent, repeated or nested?

Any answer you may have to any of the questions my help you establish the correct data structure for your analysis.

Will you be using the mutation as an independent or dependent variable? Could the unit of analysis be the mutation? or has to be the patient? if it is the patient, the second mutation may be another row in the data that may require adjusting for non independent observations. if there are too many mutation categories, perhaps you want to aggregate them in a meaningful way and that will also include the combinations.

hope this helps at least a little,

ron

 

 

 

 

Jeff_Perkinson
Community Manager Community Manager

Re: fix for two values in a single cell

I think a solution depends on exactly what you want to do with the data but the situation you find yourself in is why JMP has Multiple Response as a Modeling Type.

2021-02-18_18-59-43.627.png

This choice will treat each comma separated value in the column as an individual value. It works well in Distribution, the Data Filter, Categorical and some other places. It works in Graph Builder in JMP 16.

 

Multiple Response values frequently come up in surveys where a respondent is given the chance to choose more than one answer to a question (e.g., what magazines do you subscribe to?).

 

Tell us more about how you want to use the values there and maybe we can make more suggestions.

-Jeff
Earendur
Level I

Re: fix for two values in a single cell

@Jeff_Perkinson that is exactly what I am looking for!

 

Is there a way to get the multiple response column to work as my X factor in the Fit Y by X? I've tried to read up about multiple response but can't find an answer. Or is there a different platform I should be running?

Georg
Level VII

Re: fix for two values in a single cell

The feature multiple response was new to me, too, so thanks @Jeff_Perkinson .

If you type it into scripting index, you will find enclosed script, that can tell how to use it.

 

I think also, it is best to separate features in different columns first, then you are most flexible to investigate on them and combine them as you need (there is also a nice way to combine columns in JMP).

 

Names Default To Here( 1 );
dt = Open( "$SAMPLE_DATA/Quality Control/Failure3MultipleField.jmp" );
obj = dt << Categorical(
	X( :clean, :date ),
	Multiple Response( :Failure1, :Failure2, :Failure3 ),
	Frequency Chart( 0 )
);
Georg