Subscribe Bookmark RSS Feed

Q-K association analysis with SSR

lfsamayo_ncsu_e

Community Trekker

Joined:

Jan 30, 2016

Hello,

I am a new user of JMP, and the issue is that I have a collection of inbred lines of a crop they are genotyped with 150 SSR markers, and the phenotype is a binary trait (0 and 1), since I plan to do Q-K association analysis I have expanded the markers format using JMP Genomics 7.0 and I am using logistic regression to do association test. My case is similar to the example of the "Step-by-step guide to association analysis with SSR markers". The issue is that I have follow most of the step of the guide but I have not been able to get successful results.

What I did is:

1) Expanding the format of the mult-iallelic SSR markers (eg. 150/150, 167/150, 167/167) to 0, 1, and 2 format (where 0 and 2 are the homozygous and 1 the heterozygous).

2) Compute PCA in order to account for population structure

3) Compute a based marker relationship matrix (IBS).

4) Q-K association analysis using the Marker-Trait Association option of JMP Genomics.

    4.1 I have tried using the multi-allelic markers (not the EG_ markers) as suggested the guide and cheking the "Genotypes" box in the "Options" tap and "/" as genotype delimiter. The result is the following error message "ERROR: File WORK.GENOTYPES.DATA does not exist"

    4.2 I have tried using the extended markers (with the prefix EG_) and checking the "Numeric Genotypes" box in the "Options" tap but when I run the analysis It seems like it is running well but the process never ends, it is more like frozen.

Could anyone help me with this analysis?

Thanks

LF

5 REPLIES
kelci_miclaus

Joined:

May 27, 2014

Hi LF,


I'm sorry to hear you're having troubles with the QK analysis with multi-allelic markers.  I would like to refer you to our technical support specialists, I will forward your contact to them to help open up a support track.  The first issue we will likely need your settings and log file, they can help get the right information so we can figure out what's going on. 


How long did you let it run for the 4.2 case?  Did you check out your SAS Temp while running to see if files were still being processed?  How many lines (the size of your K matrix can add exponentially to processing, especially when you have multi-allelic markers that get expanded into many more tests than 150) do you have?  I would suggest just selecting the first 5-10 markers and trying to run it to see if that can finish.


You could also try K matrix compression under the Relatedness Measures sub-menu to reduce the size of your K matrix in the tests as well as get estimates to fix the random effect estimate as suggested by Zhang et al. (Nature Genetics, 2010). 


Kelci Miclaus

lfsamayo_ncsu_e

Community Trekker

Joined:

Jan 30, 2016

Thank you for your answer,

The first time I ran the analysis (for the 4.2 case) it took more than eight hours (machine: Intel Xeron CPU E5-1620 3.60 GHz and 64 Gb RAM) without finishing.

While it was running I clicked on the “View Log” and “Open SAS Temporary Folder” buttons and it did not responded. Also I have tried with 10 markers and waited for almost 40 min and it never ended. We have ~420 lines, they are no too much, however I have also tried to compute compressed K-matrix and I get the following error:

ERROR: None of the models converged. The K matrix cannot be compressed. Please check the model specifications.

syserr = 0

exiterror = 1

  ERROR: KMatrixCompression exited due to errors.


I am following most of the specifications of the guide except I selected a binary trait.

I can send the log files as you required

Thanks

LF

lfsamayo_ncsu_e

Community Trekker

Joined:

Jan 30, 2016

Also It is not possible to reproduce the analysis of the of the example in the "Step-by-step guide to association..." using the same data set and the same specifications, I get the following error in the first step (expansion genotype).

ERROR: The file WORK.AF (memtype=DATA) already exists. Rename was not done.

Thanks

LF

kelci_miclaus

Joined:

May 27, 2014

Hi LF,

Our development team tried and couldn't duplicate the error you're seeing. I'm going to put you in touch with our Genetics processes developer to try to help trouble shoot this further via your email!  That way you can more easily share log/settings files etc so we can help figure out the problems!

The error in K matrix compression is a little more helpful, when a trait is binary, the GLIMMIX procedure is used and it is more sensitive to models that don't fit/converge well so it could be the model you are trying to fit to your data is not possible. 

lfsamayo_ncsu_e

Community Trekker

Joined:

Jan 30, 2016

Dear Kelci,

Thank you for your support, I have tried different specifications to get

the compressed K matrix, I always get an error when I place the pca1 pca2

and pca3 in the "Q Matrix Variables" box of the "Model Variables" tab.

However:

1. when I place pca1, pca2 and pca3 in the "Q Matrix Variable" box and

also checking the "Columns of Q sum to 1" box, I get as result a K matrix

(of 420 lines) compressed in 11 groups.

2. when I only check on the "Columns of Q sum to1" box without placing the

three first principal components in the "Q Matrix Variable" box, or either

when both boxes are left blank the result is a K matrix compresed in 8

groups.

So I have used either of the two compressed K matrices in the association

tests but the results are still unsatisfactory.

LF