cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
tsl
tsl
Level III

Looking for tips for rapid conversion of string to matrix

have data that I retrieve from a database which is in a pipe-delimited string format. For example "70|71|69|78". These strings represent numeric measurements.

A large number of measurements are made and the resulting strings are really long. On top of that I have a large dataset of these strings.

What I need to do is convert the strings to a matrix ( vector really ) and run some aggregating statistics on the resulting vector ( Max, 95th percentile etc ). I haven't found a way to do it in SQL on the database side which would be better still.

Here's how I'm doing it today, using a typical lengthy string example.

ExampleStr = "70|71|71|72|72|72|72|72|74|74|74|74|74|74|74|74|74|74|75|75|75|75|75|75|75|75|75|75|75|75|75|75|75|75|75|75|75|75|75|76|75|75|76|76|76|75|75|76|76|75|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|77|76|76|76|76|76|76|76|76|76|76|76|76|77|76|76|76|76|76|76|77|76|77|76|77|76|76|77|76|76|76|76|76|76|76|76|77|76|76|76|76|77|77|77|77|76|76|77|77|76|76|76|76|77|76|77|76|76|76|76|76|77|76|76|76|77|77|77|76|76|77|76|76|76|77|76|76|76|76|76|76|77|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|77|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|76|77|76|76|76|76|76|76|76|76|76|77|77|77|77|77|77|77|77|76|77|77|76|76|77|76|76|77|76|77|76|77|76|77|77|77|76|76|76|77|77|76|76|77|76|76|76|76|76|76|76|77|76|76|76|76|76|76|76|77|77|77|76|77|77|76|76|77|76|77|76|77|77|77|77|77|76|76|77|77|77|77|77|77|76|76|77|77|77|77|77|77|76|77|77|77|77|77|77|77|77|77|77|77|76|77|77|77|77|76|77|77|76|76|76|77|77|77|77|77|77|77|77|77|76|76|77|77|77|77|77|77|77|77|77|77|76|76|76|77|77|77|77|77|77|76|77|77|77|76|77|77|77|77|76|76|77|77|77|77|77|77|77|77|77|77|76|77|77|77|77|76|77|77|77|77|77|76|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|76|77|77|76|77|77|77|77|76|77|76|77|77|77|77|77|77|77|77|77|77|76|77|77|77|77|77|77|77|77|77|76|77|77|77|77|77|76|77|77|77|77|77|77|77|77|77|77|77|77|76|77|77|77|77|77|77|77|76|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|76|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|77|76|77|77|77|77|77|77|77|77|77|77|77|77|77|78|78|78|78|78|78|78|78|78|78|77|78|78|77|78|77|77|77|78|78|78|78|78|78|77|78|77|78|78|78|78|78|78|77|78|78|78|77|78|77|78|78|78|77|78|77|78|78|78|78|78|77|78|78|78|78|78|78|78|78|78|77|78|78|78|77|78|78|78|77|77|78|78|78|78|78|78|78|78|78|78|78|78|78|77|77|78|77|77|78|77|78|78|78|78|78|78|78|78|77|77|78|78|78|77|77|78|78|78|78|78|78|78|78|78|78|78|78|77|78|78|78|78|78|78|77|78|78|78|78|77|78|78|78|78|77|77|78|78|78|78|78|78";
		t1 = HPTime();
		V = parse("["||substitute(ExampleStr,"|",",")||"]") ;
		t2 = HPTime();
		show((t2-t1)/1000); // takes about 0.9 milliSec

So I replace the pipes with commas, then make the string look like a matrix by prefixing and suffixing matrix brackets. That part is quick.

Where the bulk of the time goes is in the Parse() call.

For this example with 775 values in the pipe-delimited string, it takes about 0.9 mSec to do the above code on my computer.

That might seem like very little time, but I need it to be at least an order of magnitude less. Because my data table, in my current example, has 762,000 rows of data!

Not every string is 775 values long, some are shorter, and not every iteration takes 0.9 mS, but in total it took almost 10 mins to go through the data table and do what I need to do  which is something like this :

 

dt << new column("P95_Value");
foreachrow(
	V = parse("["||substitute(:StrColumn,"|",",")||"]") ;
	:P95_Value = VQuantile(V,0.95);	
);

The VQuantile() call is not a problem, it is very fast. The bulk of the time goes on the parse() line.

I just wondered is there any smarter way to process this string and turn it into a vector, ideally in more like 0.09 milliSec or better !

Perhaps I may need to "farm out" this processing to a C# or Python script which may handle it faster ?

10 REPLIES 10
pmroz
Super User

Re: Looking for tips for rapid conversion of string to matrix

Another brute force solution using words.  Speeds are comparable to parse.

t1 = HPTime();
w = words(examplestr, "|");
n = nitems(w);
m = j(n, 1);
for (i = 1, i <= n, i++,
	m[i] = num(w[i]);
);
t2 = HPTime();
show((t2-t1)/1000);