cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
BayesRabbit7133
Level III

Import multiple CSV files, but some title format error need to modify by JSL

I'm trying to import multiple CSV files, the "import multiple files" function doesn't work well because original data column name format has two problems that will cause title and data don't match

 

1. The data column name looks like this:       Date time name ( A B C ) (D E F ) height weight ......etc 

2. The dilim I need are space and ( and )

3. However, there is additional space or tab before "Date", so the first column name becomes Column 1 rather than Date, and title right shift one column compare to data

4. second error is (D, there in no space between this two for the title, so they are treated as one title, but the data are separated into ( and D two columns because there is space in between, so title will left shift one column compare to data

5. The first error always exist before first column name "Date", but the second error "(D" may appear at different order based on source tool

 

I would like to wirte a JSL so that it can automatically match all title and columns, but not sure how to do this:

I guess the first step is to use  Get Column Names() then parse the names, but parse() and word() seems don't work.

I' m stucked here, really raw at coding, thank you for your help in advance

 

sample data.png

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

Don't create one massive string, but rather create data tables one by one and concatenate those as you go. Also when posting JSL scripts please use "Insert JSL Script" jthi_0-1703167167340.png as it makes it much easier to read.

 

Names Default To Here(1);

dir = "$DOWNLOADS/New Folder (7)";

filelist = Files In Directory(dir);
mylist = {};

dt_collector = Empty();

For(i = 1, i <= N Items(filelist), i++,
	file = dir || "/" || filelist[i];
	str = Load Text File(file);
	lines = Words(str, "\!N");
	data = Transform Each({line}, lines,
		line = Collapse Whitespace(Substitute(line, {"(", ")"}, ""))
	);
	modifiedstr = Concat Items(data, "\!N");

	dt = Open(
		Char To Blob(modifiedstr),
		Import Settings(
			End Of Line(CRLF, CR, LF),
			End Of Field(Spaces, Space, CSV(0)),
			Strip Quotes(1),
			Use Apostrophe as Quotation Mark(0),
			Use Regional Settings(0),
			Treat empty columns as numeric(0),
			Compress Numeric Columns(0),
			Compress Character Columns(0),
			Compress Allow List Check(0),
			Labels(1),
			Column Names Start(1),
			First Named Column(1),
			Data Starts(2),
			Lines To Read("All"),
			Year Rule("20xx")
		),
		invisible
	);
	
	dt << Set Name(filelist[i]);
	dt << New Column("Source", Character, Nominal, << Set Each Value((dt << get name)));
	
	If(!Is Empty(dt_collector),
		dt_collector << Concatenate(
			dt,
			Append to first table
		);
		Close(dt, no save);
	,
		dt_collector = dt;
	);
);

jthi_2-1703167262034.png

 

-Jarmo

View solution in original post

14 REPLIES 14
jthi
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

If it is possible to provide one example (or few if there are different kinds of) .csv file it would make this easier. If not, could you provide one "broken" data table?

-Jarmo
BayesRabbit7133
Level III

Re: Import multiple CSV files, but some title format error need to modify by JSL

I attached sample table for your reference

jthi
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

Based on the two line example I would most likely go by cleaning the .csv file (remove () and collapse whitespaces) and then use Open with Char to Blob() to convert that "fixed" string into a data table

Names Default To Here(1);

str = Load Text File("$DOWNLOADS/sample CSV.csv");

lines = Words(str, "\!N");

data = Transform Each({line}, lines, 
	str = Collapse Whitespace(Substitute(line, {"(", ")"}, ""))	
);

str = Concat Items(data, "\!N");

dt = Open(Char To Blob(str),
	Import Settings(
		End Of Line(CRLF, CR, LF),
		End Of Field(Spaces, Space, CSV(0)),
		Strip Quotes(1),
		Use Apostrophe as Quotation Mark(0),
		Use Regional Settings(0),
		Scan Whole File(1),
		Treat empty columns as numeric(0),
		CompressNumericColumns(0),
		CompressCharacterColumns(0),
		CompressAllowListCheck(0),
		Labels(1),
		Column Names Start(1),
		First Named Column(1),
		Data Starts(2),
		Lines To Read("All"),
		Year Rule("20xx")
	)
);
-Jarmo
txnelson
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

@jthi 

Your solution is awesome!

One question.  Your line where you are removing the parentheses from each line, you specify

str = Collapse Whitespace(Substitute(line, {"(", ")"}, ""))	

where you substitute all "(" to ")" and then substitute all ")" to "".

My old school approach would be:

str = Collapse Whitespace(Substitute(line, "(", "", ")", ""))

where the "(" are substituted with "" and then the ")" are substituted to "".

Is there an advantage to your syntax or is it just a personal choice?

Jim
jthi
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

It is just my personal choice. Generally when I want to replace multiple things with same thing, I provide Substitute a list of the values to replace and then the value once. If I want to replace them with different things I use the approach you did show OR if I have to build it dynamically/have configuration file I will use two lists

Names Default To Here(1);

str = "AA BB CC DD";

a = Substitute(str, {"AA", "BB"}, "11"); // a = "11 11 CC DD";
b = Substitute(str, "AA", "11", "BB", "22"); // b = "11 22 CC DD";

subs = ["AA" => "33", "BB" => "44"];
c = Substitute(str, subs << get keys, subs << Get Values); // c = "33 44 CC DD";

show(a, b, c);
-Jarmo
BayesRabbit7133
Level III

Re: Import multiple CSV files, but some title format error need to modify by JSL

Thank you for your support, this method works well for single file

But I still have one question left, I have a lot of  files with same format that needs to be imported as one JMP file.

So it seems this method requires me to open them one by one then do cocatenate into one file 

This is why my original thought was:

1. use multiple import function 

2.  get column names

3. then use recode column name function to modify each column name

jthi
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

Is there some specific reason why you couldn't do the following

  1. Get list of files you are interested in
  2. Use For Each to loop over them
  3. Inside the loop parse the files and concatenate them to a table which you open first

You could try to also fix the columns but based on your example it was much easier to just fix the csv file than start messing with column names and/or shifting values after importing the values

-Jarmo
Craige_Hales
Super User

Re: Import multiple CSV files, but some title format error need to modify by JSL

Fixing the program that generates the file might also be a good option. That is not a standard CSV and I don't think any other program is going to parse it correctly either.

Craige
BayesRabbit7133
Level III

Re: Import multiple CSV files, but some title format error need to modify by JSL

I tried to use for loop to loop over files, but it doesn't work, looks like something goes wrong when I tried to concat them
 
Names Default To Here(1);
dir = Pick Directory("Select a directory");
filelist = Files In Directory(dir);
strings = {};
 
For(i = 1, i <= N Items(filelist), i++,
	file = dir || Eval(Parse(filelist[i]));
	str = Load Text File(file);
	lines = Words(str, "\!N");
	data = Transform Each({line}, lines,
		str = Collapse Whitespace(Substitute(line, {"(", ")"}, ""))
	);
	str = Concat Items(data, "\!N");
	strings = strings || str;
);
 
 
 
dt = Open(
	Char To Blob(str),
	Import Settings(
		End Of Line(CRLF, CR, LF),
		End Of Field(Spaces, Space, CSV(0)),
		Strip Quotes(1),
		Use Apostrophe as Quotation Mark(0),
		Use Regional Settings(0),
		Scan Whole File(1),
		Treat empty columns as numeric(0),
		CompressNumericColumns(0),
		CompressCharacterColumns(0),
		CompressAllowListCheck(0),
		Labels(1),
		Column Names Start(2),
		First Named Column(1),
		Data Starts(3),
		Lines To Read("All"),
		Year Rule("20xx")
	)
);