Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
dagenu
Staff
Genetic Association with JMP Genomics, Part 6: SNP-Trait Association

JMP Genomics has an array of tools to associate genetic variants with traits. One of the most straightforward of these tools is SNP-Trait Association. By performing this analysis, you can quickly identify significant markers for multiple categorical or quantifiable traits. In this example, we explore ~22,000 markers for histiocytic sarcomas (tumors) in a sample of 474 Bernese Mountain Dogs. The phenotypic classification for this data is binary (either affected or unaffected), but note that this guide works for continuous traits as well as multiple traits.

  1. From the Genomics Starter Select Genetics > GWAS Testing > SNP-Trait Association.
  2. In the General tab, select GPL15578_geno_num.sas7bdat as the Input SAS Data Set.
  1. In the box labeled List-Style Specification of SNP Variables, type “SNP:” to designate all columns starting with “SNP” as marker columns.
  2. From the Available Variables box, move phenotype into the Trait Variables Box.
  3. The By Variables box can be left blank.
    • When variables are placed in the By Variables box, a separate analysis will be performed for each unique entry for that variable.
  4. Specify an Output Folder.2_general.png
  5. In the Model Variables tab, select Binary as the Type of Trait. Remember, the phenotypic data for this trait only consists of affected vs. unaffected individuals.
  6. From the list of Available Variables, select origin and move it to the Class Variables
    • Class Variables are variables whose levels form distinct categories in the model. The origin variable, in this case, has distinct levels for the country of origin from which each sample came.
  7. In the Fixed Effects box, type “origin” to set the variable as a fixed effect.
    • Here, you can manually designate Fixed, Random, and Interaction Effects, as well as Advanced Random Effect Model Options.
  8. Strata Variables are selected to perform a stratified analysis on a binary trait. For this example, leave it blank.3_mod_vars.png
  9. In the Annotation tab, select GPL15578_anno.sas7bdat as the Annotation SAS Data Set.
    • This data set contains annotation information for each marker, like chromosome and position.4_anno_data.png
  10. Set SNP_rec_ID as the Annotation Label Variable. Entries in this column will be matched to SNP IDs in the Input SAS Data Set.
  11. Select Chr as the Annotation Group Variable. This variable will separate the results into different groups, chromosomes in this case.
  12. Select Position as the Annotation Location Variable.5_anno_snpt.png
  1. In the Options tab, select Numeric Genotypes as the Format of SNP Variables
    • For these numeric genotypes:

0 = Homozygous for major allele

1 = Heterozygous

2 = Homozygous for minor allele

  • Thus, select Homozygous for Minor Allele as the Genotype to code as 2 for the Trend Test
  1. In the Association Tests box, select Trend. If Genotype is selected, JMP will perform a Pearson chi-square test based on genotypes to determine markers of interest.  Since this adds to the computing time, leave it un-selected for this example.
  2. Check the box next to Calculate trend odds ratios. This is useful for binary or categorical traits.6_options_snpt.png
  3. The Output tab has options to add analyses to the results. Options include outputting residuals, model fitness statistics, and R-square statistics. Conversely, output can be removed, such as only including markers with significant p-values in the results. 
    • For the output in this example, leave the default boxes checked. Feel free to experiment with checking boxes and observing the resulting changes.
  4. In the P-Value Plots tab, keep -log10 as the Conversion for P-Values
  5. For the Multiple Testing Correction option, select FDR.
    • When performing multiple tests on the same hypothesis, FDR, or false discovery rate, will reduce the number of false positives.
  6. If you desire a p-Value Adjustment, one can be selected on this tab. For this example, no adjustment was made.
  7. Set the Alpha, the level at which a p-value must be for its associated marker to be considered significant, to 0.05.7_pval_plots_snpt.png
  8. Run the analysis.

Results

  1. When the analysis finishes running, the results window will appear showing the Summary Chart This tab shows the number of significant markers on each chromosome. 8_results_sig_markers.png
  2. Each of these chromosomes can be inspected individually by selecting one from the list under the Tabs heading on the left side of the window. Inspect chromosome 11 further by clicking Chr 11 Results.  Click View Tab to add the Chr 11 Results tab to the results window.
    • Notice the p-values for each marker are plotted by their position on the 11th This information comes from the Annotation Data Set added earlier.  Each of the markers that fall above the red dotted line (p = 0.05) are statistically significant.9_results_chr_11.png
  3. The Manhattan Plot tab shows a plot of every SNP in the data separated and colored by chromosome number. The y-axis of the Manhattan Plot represents the statistical significance of each marker after the False Discovery Rate correction has been applied. 10_results_manhattan.png
  4. The last tab is the Volcano Plot Each point on the plot is a single SNP, with the most statistically significant points appearing at the top of the plot.  The x-axis shows the fold change for each SNP, with more significant changes appearing at both far sides of the plot.11_results_volcano.png

Drill Downs

On the left side of the results window are Drill Down tools for analyzing markers of interest in further detail.12_results_tabs_drilldowns.png

  1. First, to create a subset of data, highlight the points on the plots from any of the results tabs, and then click the Create Subset Genotype and Annotation Data Sets button in the Drill Downs
    • Note that when points on one plot are highlighted, the same points will automatically be highlighted on the other plots.
  2. Highlight the two points in the volcano plot with the highest y-value, then click the Create Subset Genotype and Annotation Data Sets A new window will appear with three new data sets. One is the genotypic data from those two markers, one is the annotation data for those two markers, and the last is the other marker data that was not highlighted for the subset.
  3. Click the Open button next to the Genotype data and the subset will open. Note that this data set only contains two SNPs (SNP_rec_8930 & SNP_rec_15469).
  4. Click the Open button next to the Annotation data and the annotations for the two SNPs will open. We see that both SNPs are located on chromosome 11.
  5. Return to the results window and click the Plot Trait by Genotype for the two previously selected markers.
    • These plots show the distribution of genotypes for both affected and unaffected individuals for each of the markers of interest. The annotation for each of these markers can be viewed by clicking the arrow next to Annotation Information.13_result_geno_dist.png

*Interactive results from this analysis can be found on JMP Public.

JMP Genomics has many GWAS tools for marker-trait association. The SNP-Trait Association tool is a very quick and easy way to identify markers of interest in large data sets. In this guide, we discovered significant markers for a binary (affected or unaffected) trait. Note that the same tool can be used for all kinds of traits of interest including categorical and quantifiable traits.

10 Comments
Level I

Hi, I want to do SNP-Trait association on my BILs population to find markers that are related to phenotypic traits under heat stress.

Is it possible to add in the phenotypic information a number of plants used as repetition and not just one phenotypic data for a particular line?

Staff

Hi @Neta,

Are you asking if you can pool individuals (lines) so that they are represented only once? Or are you saying that you want to force the trait (phenotype) to be identical for lines that are similar? Or maybe you are asking if multiple phenotypes can be added?

Maybe showing an example of what your data looks like or want it to look like might help.

Best,

Level I
Hi, I'll explain..
I characterized BILs (Backcross inbred lines) for heat stress related
traits such as single fruit weight, seeds per fruits, etc. And I have their
genetic map (SNP map). I want to do SNP-Trait association in order to find
markers related to such traits. For each line (out of the BILs), I have 4
repetitions (ie. 4 plants per line/ per treatment: control/ heat stress). I
think that repetitions are good for getting a significant difference,
right? but I saw on your user guide just one value for phenotypic data to
each line. There is such a way to put all the repetitions?
Thank you so much!!
[image: BILs- traits.jpg]
Level I

BILs- traits.jpg

Staff

Hi @Neta,

 

The data table help. So it looks like you have an experiment of treated vs. control and then looking to model SNP-Trait associations taking into account this experimental variable (treatment).  I would include the Treatment variable in the Fixed Effects section of Model Variables tab, much like origin in the above blog example. I would use the combination of line and plant number is the individual identifier (unique ID for each individual plant)

 

You can include multiple traits (trait variables) in the trait variables section of the first tab (general) as long as they are the same type (Continuous, Binary, Nominal....). Each trait will be analyzed independently of the other, but for each trait, the treatment will be used to adjust the SNP-Trait association test ( a co-factor for each SNP being tested). 

 

Look at the ? by each setting of the dialog to get more details as to what is expected and the impact is has on the analysis.

Level I

Hi, thanks so much for the help.
But I still do not understand how to insert in the JMP file several phenotypic data for each line (8 for example), when in the same file there is the genotypic information (SNPs in each marker) for each line in a separate line. Is there an option to insert the file with the phenotypic information not along with the genotypic information (SNPs)?

Staff

Hi @Neta 

You would join the two tables together if the information is in separate tables. One would need an identifier for each row in both tables to join the two data sets. For example, in your genotype data set, if you have line and plant number as two different columns and also the same columns in the phenotype data, you can use those two columns as keys to identify the rows that belong together.

You can use JMP's Join capability under Tables menu or you can use the Merge capability in JMP Genomics found under SAS Data Set Utilities. The later will be faster for larger data sets

 

Either way, you would need to save the result as a SAS Data Set for JMP Genomics to use in the SNP-Trait Analysis.

Level I
Hi, there is a misunderstanding between us
My question is how can in the same JMP file insert phenotypic information
of a particular BIL (Backcross inbred line) with 8 values (*8 lines*) for
the same BIL (i.e. repeats of plants- 4 plants for heat treatment and 4 for
control) and in the same file put the genomic information of each BIL with
its SNPs in each Marker - that is, each BIL is in only *one line.*
My question is not how to unify the data, but how can the two be unified in
the first place? I.e. one line for genomic information and 8 lines for
phenotypic information for the same BIL in the same file.
The second question, in the genomic file- the SNPs are listed as digits 1
and 3, 1 for the cultural parent and 3 for the wild parent. Do I have to
switch the digits to 0 and 2 (there is no heterozygote) because that's what
the JMP knows?
Thank you so much!!

Level I

Hi, there is a misunderstanding between us My question is how can in the same JMP file insert phenotypic information of a particular BIL (Backcross inbred line) with 8 values (*8 lines*) for the same BIL (i.e. repeats of plants- 4 plants for heat treatment and 4 for control) and in the same file put the genomic information of each BIL with its SNPs in each Marker - that is, each BIL is in only *one line.* My question is not how to unify the data, but how can the two be unified in the first place? I.e. one line for genomic information and 8 lines for phenotypic information for the same BIL in the same file. The second question, in the genomic file- the SNPs are listed as digits 1 and 3, 1 for the cultural parent and 3 for the wild parent. Do I have to switch the digits to 0 and 2 (there is no heterozygote) because that's what the JMP knows?  

Thank you so much!!

BILs- traits.jpg

Genomic_data.jpg

Staff

Hi @Neta,

So if I understand correctly, the genotype for each line (BIL) in the first image is the same for all 8 plants for that BIL. The second image is the genotype for each BIL. Is that correct?

 

Did you run the Trial where you randomized genotype and treatment? Or did you run two separate trials, one for heat and one for control?

 

Basically, you would have to perform an ANOVA on the phenotype data to summarize it first (get the differences in response for Heat vs. Control) for each phenotype for each BIL. That way you would have one value per BIL per Phenotype. Then you could join in the genotype data set and perform SNP-Trait association using the differences for each phenotype as the response or trait of interest (a change due to treatment). Even no change due to treatment might be of interest as well..

 

The ANOVA model would depend on how you setup the trial (or trials).