Subscribe Bookmark RSS Feed

SNP Map file help...JMP Genomics importing Illumina SNP data across multiple files

nitdawg

Community Trekker

Joined:

Oct 2, 2013

Total newbie to JMP Genomics 6.0. I have the SNP final report data generated by Illumina studio software. The dataset contains 512 total samples, each with around 1E6 markers. The Final Report.txt files were split up over 13 txt files. I also have the Samples Table.csv that I converted to a tab-delimited file, so I assume I have the "sample files" selection satisfied.

The data import for Illumina supports spanning multiple genotype files, however it clearly indicates a SNP Map File is necessary when multiple genotype files are selected. My guess is to help assemble the multiple files together?

I cannot find an example of what the SNP Map File (or to double check, what the Sample Files is supposed to contain). I tried to see if there was a sample data set to help, but did find one.

Any help is greatly appreciated!

Thanks!

Jonathan

14 REPLIES
dougr

Staff

Joined:

Oct 2, 2013

Hi Johnathan.

The SNP map file is a txt file that contains the SNP Name, Chr and Position variables. You should be able to get the complete map file for your chip from the core lab.

You can find a number of step-by-step guides for JMP Genomics here:

http://www.jmp.com/lifesciences-resources/resources.shtml?tab=1

Doug Robinson

JMP Life Sciences Specialist

nitdawg

Community Trekker

Joined:

Oct 2, 2013

Hi Doug,

Sounds great. I've asked for the file that contains that info and will give it a shot! Also, I'm checking out those guides...should be a great help. Will report back ASAP.

Best,

Jonathan

nitdawg

Community Trekker

Joined:

Oct 2, 2013

Wanted to provide an update...getting closer, I think.

I now have a SNP-Map.txt file, but I think there is something missing in the file as I get the following error:


ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant,


              a missing value, INPUT, PUT.


In the SNP_Map.txt file from the core it does not look like there are column headers, simply:

line1:     track     name="HumanOmniExpressExome-8v1_B"

line2:      chr14     35173623     35173624     exm-rs10139660

and all the other SNPs on the chip.

Should I reorder and put in a header line with something? SNP, Chr...what would I title the two position headers?

I think I am close

dougr

Staff

Joined:

Oct 2, 2013

Hi Johnathan

You name the variables, SNP_Name, Chr and Position. You only need one position column. Please also make sure that the delimiter is the same as the genotype file.

Doug

nitdawg

Community Trekker

Joined:

Oct 2, 2013

Hi Doug,

I updated the map file to only contain three columns: SNP_Name, Chr, and Position. The first row contains those headers, the remaining 953891 rows have all the SNP map data. This is a text delimited file as are my 13 final report files and my sample table file.

Trying to import the data, I still get the same error:


Starting skip_header_import_2
DataFile = E:\Genotype final report\SNP_Mapv2.txt
DataType = TAB
VarNameRow = 1
VarLabelRow = 0
UniqueVarNameFlag = 1



NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The infile "E:\Genotype final report\SNP_Mapv2.txt" is:
      Filename=E:\Genotype final report\SNP_Mapv2.txt,
      RECFM=N,LRECL=32767,File Size (bytes)=23782301,
      Last Modified=18Oct2013:15:47:43,
      Create Time=18Oct2013:15:47:43


NOTE: Unexpected end of file for binary input.
NOTE: DATA statement used (Total process time):
      real time           1.62 seconds
      cpu time            0.01 seconds
     


22: LINE and COLUMN cannot be determined.
17                                                         The SAS System                             15:50 Friday, October 18, 2013


NOTE 242-205: NOSPOOL is on. Rerunning with OPTION SPOOL might allow recovery of the LINE and COLUMN where the error has occurred.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, a numeric constant, a datetime constant,
              a missing value, INPUT, PUT. 


NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.00 seconds
     



skipbyte to scan column name line: 0


NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The infile "E:\Genotype final report\SNP_Mapv2.txt" is:
      Filename=E:\Genotype final report\SNP_Mapv2.txt,
      RECFM=N,LRECL=256,File Size (bytes)=23782301,
      Last Modified=18Oct2013:15:47:43,
      Create Time=18Oct2013:15:47:43


flag=1 VarName=SNP_name columnpt=10
flag=2 VarName=Chr columnpt=14
flag=3 VarName=Position columnpt=23
flag=4 VarName=200610-179 columnpt=34
flag=5 VarName=chrY columnpt=39
flag=6 VarName=18097249 columnpt=48
flag=7 VarName=200610-298 columnpt=59
flag=8 VarName=chrY columnpt=64
flag=9 VarName=14926202 columnpt=73
flag=10 VarName=200610-303 columnpt=84
flag=11 VarName=chrY columnpt=89
flag=12 VarName=23497067 columnpt=98


Ugggh, not sure what I'm missing. Any ideas?

If I just use the import engine for a single final report (which does not require a SNP map file) it works fine, but I need to load all 13 final reports...so this makes me still think I have something wrong with the SNP map file.

Thanks!

Jonathan

dougr

Staff

Joined:

Oct 2, 2013

Hi Jonathan - is there any way you could share some of these files so we could take a look?

nitdawg

Community Trekker

Joined:

Oct 2, 2013

Sure thing!

I can arrange to send a final report if necessary, 13 final reports at 3.2 gb per report.

dougr

Staff

Joined:

Oct 2, 2013

It looks as if the Genotype column does not contain any values in the Samples Table. That could be the problem.

nitdawg

Community Trekker

Joined:

Oct 2, 2013

Thanks doug! What values are supposed to appear in that column?

I can ask the core to re-export the Samples Table, but are there other empty columns on there as well, so do they all need values?