I'm trying to understand parallel assign to try and speed up some small operations. I run the following script (warning it ran out of memory for me and locked me up) and I'm seeing it get exponentially slower with the number of rows in a vector.
for(i=1, i<=5, i++,
t = [59760, 71220, 79380, 83820, 3300, 11400, 27900, 39060, 43500, 51300,56220, 60];
t = repeat(t, 10^i);
n = nitems(t);
diff = t[2::n]-t[1::(n-1)];
day_v = diff < 0;
st = HPTime();
day_cum = cumulative Sum([0]|/day_v);
tot_cum = HPTIme() - st;
st= HPTime();
day_break = loc(day_v); // this will give you last row of the day
// make a matrix n x number of day breaks
m = J(n, nitems(day_break), -1);
parallel assign({db = day_break},
// make the item a 1 if it's row is higher than the corresponding row
m[a, b] = a > db[b];
);
m;
// now just sum the matrix
day = VSum(m`); // +1; // if you want the day to start at 1
tot_pa = HPTime() - st;
show(n, tot_pa, tot_cum);
);
I don't expect it to be faster to do the parallel assign than the cumulativesum() but I'm curious on how the parallel assign is working.
I've read this and I'm guessing it's because it's creating copies of very large matrices for each thread?
Does anyone have any other insights into how this might operate and maybe some basic dos and don'ts for parallel assign?