I reworked the graph to see the O(N) behavior and add a jsl loop that also achieves the O(N) behavior. The red, green, and blue curves now measure time per concatenated row. The red and green curves stabilize around .3µs / row but the blue curve is getting slower and slower per row copied for bigger and bigger problems because the blue loop recopies all of the previously copied data (and reallocates the matrix) for each mat added to the result. The last blue dot is 180µs * 3.8M rows, or 11 minutes. The red and green dots are ~200X faster.

dt = New Table( "timecat", New Column( "nmats" ), New Column( "loop" ), New Column( "eval" ), New Column( "copy" ) );
dt << Graph Builder( Size( 1000, 500 ), Show Control Panel( 0 ), Legend Position( "Inside Left" ),
Variables( X( :nmats ), Y( :loop ), Y( :eval, Position( 1 ) ), Y( :copy, Position( 1 ) ) ),
Elements( Points( X, Y( 1 ), Y( 2 ), Y( 3 ), Legend( 3 ) ) ),
SendToReport( Dispatch( {}, "loop", ScaleBox,
{Scale( "Log" ), Format( "Best", 6 ), Min( 0.1 ), Max( 100 ), Inc( 1 ), Minor Ticks( 1 ), Add Ref Line( 0.2, "Solid", "Black", ".2", 1 ),
Add Ref Line( 0.4, "Solid", "Black", ".4", 1 ), Label Row( Show Minor Labels( 0 ) )} ),
Dispatch( {}, "graph title", TextEditBox, {Set Text( "smaller is better" )} ),
Dispatch( {}, "X title", TextEditBox, {Set Text( "number of concats" )} ),
Dispatch( {}, "Y title", TextEditBox, {Set Text( "microseconds / row" )} ) ) );
For( N = 1, N < 3000, N = Ceiling( n * 1.2 ), // build a bunch of test sets of bigger and bigger sizes
result1 = result2 = result3 = .; // clear previous run memory, if any
OutputList = {}; // will be a list of matrices
For( i = 1, i <= N + 1, i++,
// do some calculations, generate a matrix of results, call it, ThisM
ThisM = J( i, 30, i ); // rows, cols, initial value. 30 makes this demo run in a reasonable time.
Insert Into( OutputList, ThisM );
);
// copy... fastest, green in graph, O(N) behavior. Use the red eval below because it is simpler?
start = HP Time();
// find total rows and pre-allocate
nr = 0;
nc = N Cols( OutputList[1] );
For( i = 1, i <= N Items( OutputList ), i += 1,
nr += N Rows( OutputList[i] );
If( nc != N Cols( OutputList[i] ),
Throw( "bad cols" )
);
);
result3 = J( nr, nc, . ); // allocate the result, exactly one time
nr = 0;
For( i = 1, i <= N Items( OutputList ), i += 1,
result3[nr + 1 :: nr + N Rows( OutputList[i] ), 0] = OutputList[i]; // copy pieces into the pre-allocated result
nr += N Rows( OutputList[i] ); // position the next piece after this one
);
t3 = HP Time() - start;
// loop is slower ... blue in graph. This exhibits terrible O(N^2) behavior--reallocates the result bigger and bigger
start = HP Time();
result1 = OutputList[1];
For( i = 2, i <= N Items( OutputList ), i += 1,
result1 |/= OutputList[i]; // this is the vconcat into operator. copies all the previous data into a bigger mat, each time.
);
t1 = HP Time() - start;
// eval is faster ... red in graph ... last, because OutputList is modified. Easier than the green copy and same O(N) result.
start = HP Time();
Substitute Into( OutputList, {}, Expr( vconcat ) ); // convert {a,b,c} -> vconcat(a,b,c)
result2 = Eval( OutputList ); // evaluate the expression
t2 = HP Time() - start;
// capture the timings
dt << addrows( 1 );
dt:nmats[N Rows( dt )] = N;
dt:loop[N Rows( dt )] = t1 / N Rows( result1 );
dt:eval[N Rows( dt )] = t2 / N Rows( result2 );
dt:copy[N Rows( dt )] = t3 / N Rows( result3 );
If( !All( result1 == result2 ) | !All( result1 == result3 ) | !(N Rows( result1 ) == N Rows( result2 ) == N Rows( result3 )),
Throw( "bug?" ) // they should be identical
);
Wait( .1 ); // watch the graph grow
);
Craige