cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
BayesRabbit7133
Level III

Import multiple CSV files, but some title format error need to modify by JSL

I'm trying to import multiple CSV files, the "import multiple files" function doesn't work well because original data column name format has two problems that will cause title and data don't match

 

1. The data column name looks like this:       Date time name ( A B C ) (D E F ) height weight ......etc 

2. The dilim I need are space and ( and )

3. However, there is additional space or tab before "Date", so the first column name becomes Column 1 rather than Date, and title right shift one column compare to data

4. second error is (D, there in no space between this two for the title, so they are treated as one title, but the data are separated into ( and D two columns because there is space in between, so title will left shift one column compare to data

5. The first error always exist before first column name "Date", but the second error "(D" may appear at different order based on source tool

 

I would like to wirte a JSL so that it can automatically match all title and columns, but not sure how to do this:

I guess the first step is to use  Get Column Names() then parse the names, but parse() and word() seems don't work.

I' m stucked here, really raw at coding, thank you for your help in advance

 

sample data.png

14 REPLIES 14
Craige_Hales
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

Concatenate data tables  shows one way to do it

Craige
BayesRabbit7133
Level III

Re: Import multiple CSV files, but some title format error need to modify by JSL

After several trial, below code sucessfully import multiple CSVs through loop, and the format are correct.

But I still have two remain issue:

 

1. starting from second file, the headers are not exclude as column title, they are treated as data in the table, there are two rows of headers, first row is file name, second row are titles separted by space

 

2. I would like to add filename column so that i can analyze different files, but this is not like import mutiple file function that I can easily achieve it?

 

 

 

Names Default To Here( 1 );


dir = Pick Directory( "Select a directory" );

filelist = Files In Directory( dir );

mylist = {};


For( i = 1, i <= N Items( filelist ), i++,
file = dir || "/" || filelist[i];
str = Load Text File( file );
lines = Words(str, "\!N");
data = Transform Each({line}, lines,
line = Collapse Whitespace(Substitute(line, {"(", ")"}, ""))
);
modifiedstr = Concat Items(data, "\!N");
mylist[i] = modifiedstr;
);

dt = Open(
Char To Blob( Concat Items( mylist, "\!N" ) ),
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( Spaces, Space, CSV( 0 ) ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Treat empty columns as numeric( 0 ),
Compress Numeric Columns( 0 ),
Compress Character Columns( 0 ),
Compress Allow List Check( 0 ),
Labels( 1 ),
Column Names Start( 2 ),
First Named Column( 1 ),
Data Starts( 3 ),
Lines To Read( "All" ),
Year Rule( "20xx" )
)
);

jthi
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

Don't create one massive string, but rather create data tables one by one and concatenate those as you go. Also when posting JSL scripts please use "Insert JSL Script" jthi_0-1703167167340.png as it makes it much easier to read.

 

Names Default To Here(1);

dir = "$DOWNLOADS/New Folder (7)";

filelist = Files In Directory(dir);
mylist = {};

dt_collector = Empty();

For(i = 1, i <= N Items(filelist), i++,
	file = dir || "/" || filelist[i];
	str = Load Text File(file);
	lines = Words(str, "\!N");
	data = Transform Each({line}, lines,
		line = Collapse Whitespace(Substitute(line, {"(", ")"}, ""))
	);
	modifiedstr = Concat Items(data, "\!N");

	dt = Open(
		Char To Blob(modifiedstr),
		Import Settings(
			End Of Line(CRLF, CR, LF),
			End Of Field(Spaces, Space, CSV(0)),
			Strip Quotes(1),
			Use Apostrophe as Quotation Mark(0),
			Use Regional Settings(0),
			Treat empty columns as numeric(0),
			Compress Numeric Columns(0),
			Compress Character Columns(0),
			Compress Allow List Check(0),
			Labels(1),
			Column Names Start(1),
			First Named Column(1),
			Data Starts(2),
			Lines To Read("All"),
			Year Rule("20xx")
		),
		invisible
	);
	
	dt << Set Name(filelist[i]);
	dt << New Column("Source", Character, Nominal, << Set Each Value((dt << get name)));
	
	If(!Is Empty(dt_collector),
		dt_collector << Concatenate(
			dt,
			Append to first table
		);
		Close(dt, no save);
	,
		dt_collector = dt;
	);
);

jthi_2-1703167262034.png

 

-Jarmo
BayesRabbit7133
Level III

Re: Import multiple CSV files, but some title format error need to modify by JSL

Thank you for your support, really learned a lot in this jsl project. 

Craige_Hales
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

@jthi  is right, but if you want to keep concatenating the strings, you'll need a few changes.  This will fail if the concatenation is >2GB characters because that is the longest string JMP can handle.

idea:


For( i = 1, i <= N Items( filelist ), i++,
    file = dir || "/" || filelist[i];
    str = Load Text File( file );
    lines = Words(str, "\!N");
    header1 = RemoveFrom(lines, 1); // remove the first line, two times.
    header2 = RemoveFrom(lines, 1); // (removes both header lines.)
    data = Transform Each({line}, lines,
        // prepend the file name to each data line
        filelist[i] || " " || Collapse Whitespace(Substitute(line, {"(", ")"}, ""));
    );
    modifiedstr = Concat Items(data, "\!N");
    mylist[i] = modifiedstr;
);
header2 = "filename" || header2; // add the column name "filename"
insertInto(mylist,header2,1); // check the order of the arguments, the goal
insertInto(mylist,header1,1); // is to push these onto the front of the list
// now do the import...

untested, but about right. look up InsertInto() and RemoveFrom(); I don't have JMP open at the moment. The idea is to remove (and capture) the first two lines, every time, and after all is captured and stacked, put the first two lines at the front, with the extra column name tacked on. And inside, tack on the filename for every line.

 

Craige