Read/omit particular columns from a .csv file

David_Morrow · Dec 14, 2018 07:59 AM

Hi,

I would like to read in particular columns from a .csv file. The .csv files (which I currently read into jmp and concatonate) have hundreds of columns and hundreds of thousands of lines so the time to read the files into JMP becomes very large and also the JMP file becomes so big that it eventually freezes. As not all of the columns in the CSV are required for analysis I was wondering if a JSL script could target specific columns and therefore reduce the load time (and JMP file size). I have included a very much simplified example below and simply would like to read in the .csv file below but omit the "sample_no" and "scene__max" columns so the jmp table only has 3 columns. (In reality I will be looking to exclude ~100 columns by name)

Any inputs on this would be much appreciated.

measurement_no	sample_no	level	scene__max	scene__no_of_targets
1	0	0	4	1
0	2	0	3	1
0	0	3	5	6
0	0	7	4	1

uday_guntupalli · Dec 14, 2018 5:18 AM

@David_Morrow,
In my experience, it is always better to read the file into JMP and perform filtering out of columns or rows after the data is in a JMP data table. This would be especially true when the data you want to exclude is not in a squential order. Most programming languages have optimized csv/xlsx readers built in. Unless you are willing to re-shape your table in such a way that , you can move all the columns to be filtered to one end and read the columns on the other side, I don't think filtering out alternate columns might be really helpful in this case.

Additionally, when you are opening the .csv file using the Open() are you making it a private data table ? Making a data table private while opening a file makes it a lot faster.

Best
Uday

txnelson · Dec 14, 2018 08:21 AM

I looked but did not find anyway to pick and choose which columns to read in from and Excel workbook. There are options where you can select the starting column and ending column number, and starting and ending row number

Data Starts on Row( 2 ), Data Starts on Column( 1 ), Data Ends on Row( 0 ), Data Ends on Column( 0 )

Jim

David_Morrow · Dec 19, 2018 04:37 AM

Thank you for your feedback I may have to look also at some pre-processing.

ian_jmp · Dec 19, 2018 04:58 AM

If you could generate .XLSX instead of .CSV files, maybe there's a chance to use an ODBC connection to Excel and use JMP's Query Builder (and SQL) to use a subset of the columns for importing?

And depending on what the possibilities are for restructuring the data (if any), you should be aware of the virtual join in JMP (if you are not already).

Craige_Hales · Dec 19, 2018 11:30 AM

from the text import wizard for CSV files, on the 2nd page, mark the column excluded:

red slash circle

When you import, the script has the code you need to do it again:

Omitted Column

Craige

Read/omit particular columns from a .csv file

Re: Read/omit particular columns from a .csv file

Re: Read/omit particular columns from a .csv file

Re: Read/omit particular columns from a .csv file

Re: Read/omit particular columns from a .csv file

Re: Read/omit particular columns from a .csv file