cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
ashwint27
Level II

Initialize data across multiple columns at once

I am interested in initializing data across multiple columns in the same manner all at once.  For example, I want to create M number of columns with random-normal data (mean = x, stdev = y), through N rows.   Any help is appreciated.  Thanks.

 

4 REPLIES 4

Re: Initialize data across multiple columns at once

Based on the limited information provided, this script proves the concept.

 

Names Default to Here( 1 );

M = 10;
N = 30;

data = J( N, M, Random Normal() );

As Table( data );

Re: Initialize data across multiple columns at once

Adding to what Mark supplied: Random Normal ( ) can be used without arguments, or with 2 arguments.

 

Without arguments, you get N( mean == 0, stddev == 1) data. So Mark's code will give you an MxN table of N(0,1) data.

 

Using arguments, Random Normal ( x, y ) gives you N( mean == x, stddev == y) data.

 

Since your post mentions a mean of x and a stddev of y, you could use the following, if x is not 0 and/or y is not 1.

 

As Table ( J ( N, M, Random Normal ( x, y ) );

 

 

Re: Initialize data across multiple columns at once

Here's a funny thing:

 

You might save a little bit of time using

 

y * J ( M, N, randomNormal () ) + x

 

instead of

 

J( M, N, randomnormal (x, y) ).

 

The reason is that the J function is really a loop...  so in the J( M, N, randomnormal (x, y) ) example, each time through the loop a standard random normal is chosen, multiplied by y and added to x to produce a random N(x, y).

 

In the y * J ( M, N, randomNormal () ) + x example, all of the standard random normals are generated, and afterward the entire matrix is multiplied by y and added to a dimensionally similar x matrix.

 

It's not a huge difference, but in the 1000x1000 matrices I was investigating, the 2nd approach ran in about 3/4 the time of the first.

Craige_Hales
Super User

Re: Initialize data across multiple columns at once

The extra time is spent looking up the values of x and y 999,999 extra times, not in the actual multiply which has to be done anyway. You can use numbers in place of x,y and get a between time because the number evaluates with no lookup in the namespace. The baseline case below does not include the multiply and add which would likely get an answer closer to the 3/4 you described.

q=j(size,size,randomnormal());    // 1.0
q=j(size,size,randomnormal(0,1)); // 1.1
q=j(size,size,randomnormal(x,y)); // 1.4

 

Craige