cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Angela24996
Level II

How do I multiply two tables together

I have a big data table and a normalization factor table.

The data table looks like this:

IDTypeparameter1parameter2parameter3
011datadatadata
021datadatadata
032datadatadata
042datadata data

The normalization factor table looks like this:

Typeparameter1parameter2parameter3
1normalization factor1normalization factor2normalization factor3
2normalization factor4normalization factor5normalization factor6

How can I multiply the first table by the second table based on type and parameter and obtain a new table like this:

IDTypeparameter1parameter2parameter3
011data*normalization factor1data*normalization factor2data*normalization factor3
021data*normalization factor1data*normalization factor2data*normalization factor3
032data*normalization factor4data*normalization factor5data*normalization factor6
042data*normalization factor4data*normalization factor5 data*normalization factor6

My raw data table is huge in both rows and columns so I want a fast solution.

Thank you very much.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: How do I multiply two tables together

Hi,

 

I took a look at this and noticed a performance wall once the table got near the size of yours (actually, well before that.) I was able to improve things a bit once I began to process the table in "blocks" of about 10K rows by 1K columns. You could play with the size to see what works best, but these dimensions seemed to work well.

 

This method does not create a new table--it overwrites the original... so you'd want to save the original, then save the result under a different name.

 

The script below makes a couple of assumptions, as it is just meant to illustrate the overall concept--

- corresponding columns are in the same order in each table.

- for p parameters, the main table has p+2 columns, and the factor table has p+1 columns. there are no other columns in either table.

 

The actual processing time for a 1million row table with a thousand columns was about 40 seconds... each "block" took about 0.3 - 0.4 seconds to process.

 

The table setup portion, to mock the tables you have, took a while... probably a couple of minutes. Currently the script is set for a smaller table... modify the lines under the //table settings comment to change this.

 

The code is neither elegant nor compact, but it runs faster than anything else I could come up with (including using element-wise multiplication on the entire block at once... for some reason, iterating over the columns took about half the time.) I didn't try Craige's formula-based approach, though.

 

The initial sort is needed to keep from having to fetch rows using where() (if you have to do that, it slows things considerably.)

 

Cheers,

Brady

 

Names Default To Here( 1 );

//set limits on block size
blockRowLimit = 1e4;
blockColLimit = 1e3;

// table setup
rowsdt1 = 1e4;
colsdt1 = 1e2;

dt1 = As Table( (1 :: rowsDt1)` || J( rowsDt1, 1, Random Integer( 1, 2 ) ) || J( rowsdt1, colsdt1, Random Uniform() ), <<invisible );
Column( dt1, 1 ) << set name( "ID" );
Column( dt1, 2 ) << set name( "Type" );

dt2 = As Table( [1 1, 2 2] || J( 2, colsdt1, Random Uniform() ), <<invisible );
Column( dt2, 1 ) << set name( "Type" );
dt2 << Delete Columns( 2 );


////processing begins here

dt1 << sort( by( :type ), order( ascending ), replacetable );
type2Begin = Contains( dt1:type << getasmatrix, 2 );		//row # of first occurrence of type == 2 in sorted table.

//build matrices that show where row & column divisions of "blocks" will be
rowBreaks = (1 :: N Row( dt1 ) :: blockRowLimit) || Matrix( type2Begin ) || Matrix( N Row( dt1 ) );
rowbreaks = rowbreaks[Rank( rowbreaks )]; //sort the matrix

colBreaks = (3 :: N Col( dt1 ) :: blockColLimit) || Matrix( N Col( dt1 ) );
colBreaks = colBreaks[Rank( colBreaks )]; //sort the matrix

nrowBreaks = N Row( rowBreaks );
ncolBreaks = N Row( colBreaks );

For( i = 1, i < nrowBreaks, i++,
	lowRow = rowBreaks[i];
	highRow = if ( i == nRowBreaks - 1, rowBreaks[i+1], rowBreaks[i+1] -1 ); //we don't need to subtract 1 off the last upper bound.
	rowRange = lowRow :: highRow;
	
	If( rowrange[1] < type2Begin, factorRow = 1, factorRow = 2); //determine which row of factors we'll work with for this row range of data
	
	For( j = 1, j < ncolBreaks, j++,		
		lowCol = colBreaks[j];
		highCol = if ( j == ncolBreaks - 1, colbreaks[j+1], colbreaks[j+1] -1 );  //we don't need to subtract 1 off the last upper bound.
		For( c = lowCol, c <= highCol, c++,
			dt1[rowRange, c] *= dt2[factorRow, c - 1]   // -1 because dt1 has an ID column while dt2 does not
		);				
	);
);

dt1 << bring window to front;

 

View solution in original post

9 REPLIES 9
Craige_Hales
Super User

Re: How do I multiply two tables together

You could use a virtual join and formula columns.

Virtual join adds the blue and gold keysVirtual join adds the blue and gold keys

You can link them interactively or with JSL.

This may be using JMP16 names with the  :"parameter3[Type]"n syntax; you might need name("parameter3[Type]") instead.

// make example tables. **save them** so virtual join works.
factorDT = New Table( "Imported Data 2",
	New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2] ) ),
	New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 1.2] ) ),
	New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0.8, 0.9] ) ),
	New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 0.9] ) )
);
factorDT << save( "$temp/factor.jmp" );
dataDT = New Table( "Imported Data",
	New Column( "ID", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
	New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 1, 2, 2] ) ),
	New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
	New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [4, 3, 2, 1] ) ),
	New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [2, 3, 3, 2] ) )
);
dataDT << save( "$temp/data.jmp" );

// join the tables
factorDT:type << Set Property( "Link ID", 1 );
dataDT:type << Set Property( "Link Reference", Reference Table( "factor.jmp" ) );

// add formula columns
dataDT << New Column( "p 1", Numeric, "Continuous", Format( "Best", 12 ), Formula( :"parameter1[Type]"n * :parameter1 ) );
dataDT << New Column( "p 2", Numeric, "Continuous", Format( "Best", 12 ), Formula( :"parameter2[Type]"n * :parameter2 ) );
dataDT << New Column( "p 3", Numeric, "Continuous", Format( "Best", 12 ), Formula( :"parameter3[Type]"n * :parameter3 ) );

// force the evaluation to complete
dataDT<<RunFormulas;

 

 

 

Craige
Angela24996
Level II

Re: How do I multiply two tables together

Thank you for your reply. However, I have many columns and column names are complicated. Therefore I want to write a for loop to create new columns. Can you explain more on this statement?

Formula( :"parameter1[Type]"n * :parameter1 )

 I am thinking about writing a for loop like this:

 

colnames = dataDT << Get Column Names();
nc = ncols(dataDT);

DT = New Table("normalized", New Column("ID", values(dataDT:ID << get values)),
	New Column("Type", values(dataDT:Type << get values)));
For(i = 3, i<= nc, i++,
DT << New Column( char( colnames[3]), Numeric, "Continuous", Format( "Best", 12 ), Formula( XXXX ) ));

But I don't know how to construct the formula correctly. It keeps giving me errors. Can you help me?

Thank you.

 

Craige_Hales
Super User

Re: How do I multiply two tables together

The loop is not easy; I think this will do it:

// make example tables. **save them** so virtual join works.
factorDT = New Table( "Imported Data 2",
	New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2] ) ),
	New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 1.2] ) ),
	New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0.8, 0.9] ) ),
	New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 0.9] ) )
);
factorDT << save( "$temp/factor.jmp" );
dataDT = New Table( "Imported Data",
	New Column( "ID", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
	New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 1, 2, 2] ) ),
	New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
	New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [4, 3, 2, 1] ) ),
	New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [2, 3, 3, 2] ) )
);
dataDT << save( "$temp/data.jmp" );

originalColumns = dataDT<<getColumnNames("string");

// join the tables
factorDT:type << Set Property( "Link ID", 1 );
dataDT:type << Set Property( "Link Reference", Reference Table( "factor.jmp" ) );

// add formula columns
for(i=3,i<=nitems(originalColumns),i+=1,
	e = "dataDT << New Column("
		|| "\!"" || originalColumns[i] || "_scaled\!"" // "parameter1_scaled"
		|| ", Numeric, \!"Continuous\!", Format( \!"Best\!", 12 ), Formula( dataDT:name(\!"" 
		|| originalColumns[i] || "[Type]" // this is the factor, in quotation marks, in name()
		|| "\!") * dataDT:name(\!"" 
		|| originalColumns[i] || // this is the data, in quotation marks, in name()
		"\!")))";
	// write(e,"\!n"); // dataDT << New Column("parameter1_scaled", Numeric, "Continuous", Format( "Best", 12 ), Formula( dataDT:name("parameter1[Type]") * dataDT:name("parameter1")))
	eval(parse(e));
);
// force the evaluation to complete
dataDT<<RunFormulas;
  • capture the original columns before adding the hidden columns with virtual join
  • build the expression e using \!" to escape the embedded quotation marks
  • use write() to debug your work, not print
  • <<RunFormulas is important for speed; the default background evaluation of column formulas is a lot slower

 

Craige
Angela24996
Level II

Re: How do I multiply two tables together

Thank you for the updates, I tried and it works on a trial dataset. However I do meet performance problems as pointed out by another person. Therefore I accepted his answer as the best solution.

ian_jmp
Staff

Re: How do I multiply two tables together

And if you don't need the formulas, you could consider element wise multiplication of matrices which may or may not be 'better'. On my MacBook the code below took about 15 seconds.

NamesDefaultToHere(1);

nr = 10000; 	// Number of rows
nc = 10000;		// Number of Columns

dt1 = AsTable(J(nr, nc, RandomInteger(1, 20)));
dt2 = AsTable(J(nr, nc, RandomUniform(0, 1)));

m1 = dt1 << getAsMatrix;
m2 = dt2 << getAsMatrix;

dtFinal = AsTable(m1 :* m2);

 

Angela24996
Level II

Re: How do I multiply two tables together

Thank you, it works on a small dataset. But I have millions of rows in table 1 and only two rows in table 2. Therefore I want to avoid creating another huge table. If I can't find a better solution, I will try this method.

Re: How do I multiply two tables together

Hi,

 

I took a look at this and noticed a performance wall once the table got near the size of yours (actually, well before that.) I was able to improve things a bit once I began to process the table in "blocks" of about 10K rows by 1K columns. You could play with the size to see what works best, but these dimensions seemed to work well.

 

This method does not create a new table--it overwrites the original... so you'd want to save the original, then save the result under a different name.

 

The script below makes a couple of assumptions, as it is just meant to illustrate the overall concept--

- corresponding columns are in the same order in each table.

- for p parameters, the main table has p+2 columns, and the factor table has p+1 columns. there are no other columns in either table.

 

The actual processing time for a 1million row table with a thousand columns was about 40 seconds... each "block" took about 0.3 - 0.4 seconds to process.

 

The table setup portion, to mock the tables you have, took a while... probably a couple of minutes. Currently the script is set for a smaller table... modify the lines under the //table settings comment to change this.

 

The code is neither elegant nor compact, but it runs faster than anything else I could come up with (including using element-wise multiplication on the entire block at once... for some reason, iterating over the columns took about half the time.) I didn't try Craige's formula-based approach, though.

 

The initial sort is needed to keep from having to fetch rows using where() (if you have to do that, it slows things considerably.)

 

Cheers,

Brady

 

Names Default To Here( 1 );

//set limits on block size
blockRowLimit = 1e4;
blockColLimit = 1e3;

// table setup
rowsdt1 = 1e4;
colsdt1 = 1e2;

dt1 = As Table( (1 :: rowsDt1)` || J( rowsDt1, 1, Random Integer( 1, 2 ) ) || J( rowsdt1, colsdt1, Random Uniform() ), <<invisible );
Column( dt1, 1 ) << set name( "ID" );
Column( dt1, 2 ) << set name( "Type" );

dt2 = As Table( [1 1, 2 2] || J( 2, colsdt1, Random Uniform() ), <<invisible );
Column( dt2, 1 ) << set name( "Type" );
dt2 << Delete Columns( 2 );


////processing begins here

dt1 << sort( by( :type ), order( ascending ), replacetable );
type2Begin = Contains( dt1:type << getasmatrix, 2 );		//row # of first occurrence of type == 2 in sorted table.

//build matrices that show where row & column divisions of "blocks" will be
rowBreaks = (1 :: N Row( dt1 ) :: blockRowLimit) || Matrix( type2Begin ) || Matrix( N Row( dt1 ) );
rowbreaks = rowbreaks[Rank( rowbreaks )]; //sort the matrix

colBreaks = (3 :: N Col( dt1 ) :: blockColLimit) || Matrix( N Col( dt1 ) );
colBreaks = colBreaks[Rank( colBreaks )]; //sort the matrix

nrowBreaks = N Row( rowBreaks );
ncolBreaks = N Row( colBreaks );

For( i = 1, i < nrowBreaks, i++,
	lowRow = rowBreaks[i];
	highRow = if ( i == nRowBreaks - 1, rowBreaks[i+1], rowBreaks[i+1] -1 ); //we don't need to subtract 1 off the last upper bound.
	rowRange = lowRow :: highRow;
	
	If( rowrange[1] < type2Begin, factorRow = 1, factorRow = 2); //determine which row of factors we'll work with for this row range of data
	
	For( j = 1, j < ncolBreaks, j++,		
		lowCol = colBreaks[j];
		highCol = if ( j == ncolBreaks - 1, colbreaks[j+1], colbreaks[j+1] -1 );  //we don't need to subtract 1 off the last upper bound.
		For( c = lowCol, c <= highCol, c++,
			dt1[rowRange, c] *= dt2[factorRow, c - 1]   // -1 because dt1 has an ID column while dt2 does not
		);				
	);
);

dt1 << bring window to front;

 

madhu
Level III

Re: How do I multiply two tables together

Hi

I have two matrices to multiple together, for example, the first one is 3X4 matrix and the second is 4X2 matrix. Can anybody suggest how can perform this matrix multiplication in JMP Pro 16?

txnelson
Super User

Re: How do I multiply two tables together

Here is an example

mat1 = [ 1 2 3, 4 5 6, 7 8 9, 10 11 12];
mat2 = [1 2 3 4, 5 6 7 8, 9 10 11 12];
finalMat = mat1 * mat2;

Here is an example where the data are in data tables

names default to here(1);
dt1 = data table("one");
dt2 = data table("two");
finalMat=dt1[0,0]*dt2[0,0];
Jim