cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
New to using JMP? Hit the ground running with the Early User Edition of Discovery Summit. Register now, free of charge.
Register for our Discovery Summit 2024 conference, Oct. 21-24, where you’ll learn, connect, and be inspired.
Choose Language Hide Translation Bar
miguello
Level VI

Read CSV line by line to find where data starts

I need to load aCSV file which might be in number of different formats. And I need to find what line my data starts on. I know that the column names start after a certain line, let's say "[Data starts here]".

This is what I do:

ex = Load Text File(
	filePath
);
words = Words( ex, "\!n\!r" );
headersLocation = Loc(words, "[Data starts here]");

and then I use headersLocation+1 to specify where my headers start.

 

The problem happens when I have an empty line in my CSV file:

line1
line2

[Data Starts Here]
Header1, header2
0,0
1,1

When I break this file into words, line "[Data STarts Here]" will be the third one, Loca() function will give me 3 in this case, while in reality it's 4th line.

So when breaking file by Words function - that empty line just doesn't make it into the list. 

And I don't know how many empty lines I may have in the file before data starts.

How do I go about it?

 

I can break by either "\!n" or "\!r" only, and then use Left() or Right() in a for loop to remove the remaining special symbol, but it seems too brute force and not very optimal and I'm not sure it is correct. Something like this:

For Each( {value, index}, words, words[index] = Right(value, Length(value)-1););

The whole script being:

ex = Load Text File(
	filePath
);
words = Words( ex, "\!n\!r" );
For Each( {value, index}, words, words[index] = Right(value, Length(value)-1););
headersLocation = Loc(words, "[Data starts here]");

Then it seems to be working. But is this the best and even a correct solution?

 

1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Read CSV line by line to find where data starts

Do you need the location for JMP's data import platform? If you have no need for the data before [Data Starts Here], you could just drop it

Names Default To Here(1);
csv = "line1
line2

[Data Starts Here]
Header1, header2
0,0
1,1
";

string_to_search = "[Data Starts Here]";
data_start_idx = Contains(csv, string_to_search);

data_str = Substr(csv, data_start_idx + Length(string_to_search) + 1);

And if you need the JMP's import, you can use Char To Blob() with the string you now have:

dt = Open(
	Char To Blob(data_str),
	Import Settings(
		End Of Line(CRLF, CR, LF),
		End Of Field(Comma, CSV(0)),
		Strip Quotes(1),
		Use Apostrophe as Quotation Mark(0),
		Use Regional Settings(0),
		Scan Whole File(1),
		Treat empty columns as numeric(0),
		CompressNumericColumns(0),
		CompressCharacterColumns(0),
		CompressAllowListCheck(0),
		Labels(1),
		Column Names Start(1),
		Data Starts(2),
		Lines To Read("All"),
		Year Rule("20xx")
	)
);
-Jarmo

View solution in original post

2 REPLIES 2
jthi
Super User

Re: Read CSV line by line to find where data starts

Do you need the location for JMP's data import platform? If you have no need for the data before [Data Starts Here], you could just drop it

Names Default To Here(1);
csv = "line1
line2

[Data Starts Here]
Header1, header2
0,0
1,1
";

string_to_search = "[Data Starts Here]";
data_start_idx = Contains(csv, string_to_search);

data_str = Substr(csv, data_start_idx + Length(string_to_search) + 1);

And if you need the JMP's import, you can use Char To Blob() with the string you now have:

dt = Open(
	Char To Blob(data_str),
	Import Settings(
		End Of Line(CRLF, CR, LF),
		End Of Field(Comma, CSV(0)),
		Strip Quotes(1),
		Use Apostrophe as Quotation Mark(0),
		Use Regional Settings(0),
		Scan Whole File(1),
		Treat empty columns as numeric(0),
		CompressNumericColumns(0),
		CompressCharacterColumns(0),
		CompressAllowListCheck(0),
		Labels(1),
		Column Names Start(1),
		Data Starts(2),
		Lines To Read("All"),
		Year Rule("20xx")
	)
);
-Jarmo
miguello
Level VI

Re: Read CSV line by line to find where data starts

Hmmm... Interesting suggestion. Let me try it. It might be easier after all.