- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
How do I multiply two tables together
I have a big data table and a normalization factor table.
The data table looks like this:
ID | Type | parameter1 | parameter2 | parameter3 |
01 | 1 | data | data | data |
02 | 1 | data | data | data |
03 | 2 | data | data | data |
04 | 2 | data | data | data |
The normalization factor table looks like this:
Type | parameter1 | parameter2 | parameter3 |
1 | normalization factor1 | normalization factor2 | normalization factor3 |
2 | normalization factor4 | normalization factor5 | normalization factor6 |
How can I multiply the first table by the second table based on type and parameter and obtain a new table like this:
ID | Type | parameter1 | parameter2 | parameter3 |
01 | 1 | data*normalization factor1 | data*normalization factor2 | data*normalization factor3 |
02 | 1 | data*normalization factor1 | data*normalization factor2 | data*normalization factor3 |
03 | 2 | data*normalization factor4 | data*normalization factor5 | data*normalization factor6 |
04 | 2 | data*normalization factor4 | data*normalization factor5 | data*normalization factor6 |
My raw data table is huge in both rows and columns so I want a fast solution.
Thank you very much.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
Hi,
I took a look at this and noticed a performance wall once the table got near the size of yours (actually, well before that.) I was able to improve things a bit once I began to process the table in "blocks" of about 10K rows by 1K columns. You could play with the size to see what works best, but these dimensions seemed to work well.
This method does not create a new table--it overwrites the original... so you'd want to save the original, then save the result under a different name.
The script below makes a couple of assumptions, as it is just meant to illustrate the overall concept--
- corresponding columns are in the same order in each table.
- for p parameters, the main table has p+2 columns, and the factor table has p+1 columns. there are no other columns in either table.
The actual processing time for a 1million row table with a thousand columns was about 40 seconds... each "block" took about 0.3 - 0.4 seconds to process.
The table setup portion, to mock the tables you have, took a while... probably a couple of minutes. Currently the script is set for a smaller table... modify the lines under the //table settings comment to change this.
The code is neither elegant nor compact, but it runs faster than anything else I could come up with (including using element-wise multiplication on the entire block at once... for some reason, iterating over the columns took about half the time.) I didn't try Craige's formula-based approach, though.
The initial sort is needed to keep from having to fetch rows using where() (if you have to do that, it slows things considerably.)
Cheers,
Brady
Names Default To Here( 1 );
//set limits on block size
blockRowLimit = 1e4;
blockColLimit = 1e3;
// table setup
rowsdt1 = 1e4;
colsdt1 = 1e2;
dt1 = As Table( (1 :: rowsDt1)` || J( rowsDt1, 1, Random Integer( 1, 2 ) ) || J( rowsdt1, colsdt1, Random Uniform() ), <<invisible );
Column( dt1, 1 ) << set name( "ID" );
Column( dt1, 2 ) << set name( "Type" );
dt2 = As Table( [1 1, 2 2] || J( 2, colsdt1, Random Uniform() ), <<invisible );
Column( dt2, 1 ) << set name( "Type" );
dt2 << Delete Columns( 2 );
////processing begins here
dt1 << sort( by( :type ), order( ascending ), replacetable );
type2Begin = Contains( dt1:type << getasmatrix, 2 ); //row # of first occurrence of type == 2 in sorted table.
//build matrices that show where row & column divisions of "blocks" will be
rowBreaks = (1 :: N Row( dt1 ) :: blockRowLimit) || Matrix( type2Begin ) || Matrix( N Row( dt1 ) );
rowbreaks = rowbreaks[Rank( rowbreaks )]; //sort the matrix
colBreaks = (3 :: N Col( dt1 ) :: blockColLimit) || Matrix( N Col( dt1 ) );
colBreaks = colBreaks[Rank( colBreaks )]; //sort the matrix
nrowBreaks = N Row( rowBreaks );
ncolBreaks = N Row( colBreaks );
For( i = 1, i < nrowBreaks, i++,
lowRow = rowBreaks[i];
highRow = if ( i == nRowBreaks - 1, rowBreaks[i+1], rowBreaks[i+1] -1 ); //we don't need to subtract 1 off the last upper bound.
rowRange = lowRow :: highRow;
If( rowrange[1] < type2Begin, factorRow = 1, factorRow = 2); //determine which row of factors we'll work with for this row range of data
For( j = 1, j < ncolBreaks, j++,
lowCol = colBreaks[j];
highCol = if ( j == ncolBreaks - 1, colbreaks[j+1], colbreaks[j+1] -1 ); //we don't need to subtract 1 off the last upper bound.
For( c = lowCol, c <= highCol, c++,
dt1[rowRange, c] *= dt2[factorRow, c - 1] // -1 because dt1 has an ID column while dt2 does not
);
);
);
dt1 << bring window to front;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
You could use a virtual join and formula columns.
Virtual join adds the blue and gold keys
You can link them interactively or with JSL.
This may be using JMP16 names with the :"parameter3[Type]"n
syntax; you might need name("parameter3[Type]"
) instead.
// make example tables. **save them** so virtual join works.
factorDT = New Table( "Imported Data 2",
New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2] ) ),
New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 1.2] ) ),
New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0.8, 0.9] ) ),
New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 0.9] ) )
);
factorDT << save( "$temp/factor.jmp" );
dataDT = New Table( "Imported Data",
New Column( "ID", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 1, 2, 2] ) ),
New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [4, 3, 2, 1] ) ),
New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [2, 3, 3, 2] ) )
);
dataDT << save( "$temp/data.jmp" );
// join the tables
factorDT:type << Set Property( "Link ID", 1 );
dataDT:type << Set Property( "Link Reference", Reference Table( "factor.jmp" ) );
// add formula columns
dataDT << New Column( "p 1", Numeric, "Continuous", Format( "Best", 12 ), Formula( :"parameter1[Type]"n * :parameter1 ) );
dataDT << New Column( "p 2", Numeric, "Continuous", Format( "Best", 12 ), Formula( :"parameter2[Type]"n * :parameter2 ) );
dataDT << New Column( "p 3", Numeric, "Continuous", Format( "Best", 12 ), Formula( :"parameter3[Type]"n * :parameter3 ) );
// force the evaluation to complete
dataDT<<RunFormulas;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
Thank you for your reply. However, I have many columns and column names are complicated. Therefore I want to write a for loop to create new columns. Can you explain more on this statement?
Formula( :"parameter1[Type]"n * :parameter1 )
I am thinking about writing a for loop like this:
colnames = dataDT << Get Column Names();
nc = ncols(dataDT);
DT = New Table("normalized", New Column("ID", values(dataDT:ID << get values)),
New Column("Type", values(dataDT:Type << get values)));
For(i = 3, i<= nc, i++,
DT << New Column( char( colnames[3]), Numeric, "Continuous", Format( "Best", 12 ), Formula( XXXX ) ));
But I don't know how to construct the formula correctly. It keeps giving me errors. Can you help me?
Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
The loop is not easy; I think this will do it:
// make example tables. **save them** so virtual join works.
factorDT = New Table( "Imported Data 2",
New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2] ) ),
New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 1.2] ) ),
New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [0.8, 0.9] ) ),
New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1.1, 0.9] ) )
);
factorDT << save( "$temp/factor.jmp" );
dataDT = New Table( "Imported Data",
New Column( "ID", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
New Column( "Type", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 1, 2, 2] ) ),
New Column( "parameter1", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2, 3, 4] ) ),
New Column( "parameter2", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [4, 3, 2, 1] ) ),
New Column( "parameter3", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [2, 3, 3, 2] ) )
);
dataDT << save( "$temp/data.jmp" );
originalColumns = dataDT<<getColumnNames("string");
// join the tables
factorDT:type << Set Property( "Link ID", 1 );
dataDT:type << Set Property( "Link Reference", Reference Table( "factor.jmp" ) );
// add formula columns
for(i=3,i<=nitems(originalColumns),i+=1,
e = "dataDT << New Column("
|| "\!"" || originalColumns[i] || "_scaled\!"" // "parameter1_scaled"
|| ", Numeric, \!"Continuous\!", Format( \!"Best\!", 12 ), Formula( dataDT:name(\!""
|| originalColumns[i] || "[Type]" // this is the factor, in quotation marks, in name()
|| "\!") * dataDT:name(\!""
|| originalColumns[i] || // this is the data, in quotation marks, in name()
"\!")))";
// write(e,"\!n"); // dataDT << New Column("parameter1_scaled", Numeric, "Continuous", Format( "Best", 12 ), Formula( dataDT:name("parameter1[Type]") * dataDT:name("parameter1")))
eval(parse(e));
);
// force the evaluation to complete
dataDT<<RunFormulas;
- capture the original columns before adding the hidden columns with virtual join
- build the expression e using \!" to escape the embedded quotation marks
- use write() to debug your work, not print
- <<RunFormulas is important for speed; the default background evaluation of column formulas is a lot slower
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
Thank you for the updates, I tried and it works on a trial dataset. However I do meet performance problems as pointed out by another person. Therefore I accepted his answer as the best solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
And if you don't need the formulas, you could consider element wise multiplication of matrices which may or may not be 'better'. On my MacBook the code below took about 15 seconds.
NamesDefaultToHere(1);
nr = 10000; // Number of rows
nc = 10000; // Number of Columns
dt1 = AsTable(J(nr, nc, RandomInteger(1, 20)));
dt2 = AsTable(J(nr, nc, RandomUniform(0, 1)));
m1 = dt1 << getAsMatrix;
m2 = dt2 << getAsMatrix;
dtFinal = AsTable(m1 :* m2);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
Thank you, it works on a small dataset. But I have millions of rows in table 1 and only two rows in table 2. Therefore I want to avoid creating another huge table. If I can't find a better solution, I will try this method.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
Hi,
I took a look at this and noticed a performance wall once the table got near the size of yours (actually, well before that.) I was able to improve things a bit once I began to process the table in "blocks" of about 10K rows by 1K columns. You could play with the size to see what works best, but these dimensions seemed to work well.
This method does not create a new table--it overwrites the original... so you'd want to save the original, then save the result under a different name.
The script below makes a couple of assumptions, as it is just meant to illustrate the overall concept--
- corresponding columns are in the same order in each table.
- for p parameters, the main table has p+2 columns, and the factor table has p+1 columns. there are no other columns in either table.
The actual processing time for a 1million row table with a thousand columns was about 40 seconds... each "block" took about 0.3 - 0.4 seconds to process.
The table setup portion, to mock the tables you have, took a while... probably a couple of minutes. Currently the script is set for a smaller table... modify the lines under the //table settings comment to change this.
The code is neither elegant nor compact, but it runs faster than anything else I could come up with (including using element-wise multiplication on the entire block at once... for some reason, iterating over the columns took about half the time.) I didn't try Craige's formula-based approach, though.
The initial sort is needed to keep from having to fetch rows using where() (if you have to do that, it slows things considerably.)
Cheers,
Brady
Names Default To Here( 1 );
//set limits on block size
blockRowLimit = 1e4;
blockColLimit = 1e3;
// table setup
rowsdt1 = 1e4;
colsdt1 = 1e2;
dt1 = As Table( (1 :: rowsDt1)` || J( rowsDt1, 1, Random Integer( 1, 2 ) ) || J( rowsdt1, colsdt1, Random Uniform() ), <<invisible );
Column( dt1, 1 ) << set name( "ID" );
Column( dt1, 2 ) << set name( "Type" );
dt2 = As Table( [1 1, 2 2] || J( 2, colsdt1, Random Uniform() ), <<invisible );
Column( dt2, 1 ) << set name( "Type" );
dt2 << Delete Columns( 2 );
////processing begins here
dt1 << sort( by( :type ), order( ascending ), replacetable );
type2Begin = Contains( dt1:type << getasmatrix, 2 ); //row # of first occurrence of type == 2 in sorted table.
//build matrices that show where row & column divisions of "blocks" will be
rowBreaks = (1 :: N Row( dt1 ) :: blockRowLimit) || Matrix( type2Begin ) || Matrix( N Row( dt1 ) );
rowbreaks = rowbreaks[Rank( rowbreaks )]; //sort the matrix
colBreaks = (3 :: N Col( dt1 ) :: blockColLimit) || Matrix( N Col( dt1 ) );
colBreaks = colBreaks[Rank( colBreaks )]; //sort the matrix
nrowBreaks = N Row( rowBreaks );
ncolBreaks = N Row( colBreaks );
For( i = 1, i < nrowBreaks, i++,
lowRow = rowBreaks[i];
highRow = if ( i == nRowBreaks - 1, rowBreaks[i+1], rowBreaks[i+1] -1 ); //we don't need to subtract 1 off the last upper bound.
rowRange = lowRow :: highRow;
If( rowrange[1] < type2Begin, factorRow = 1, factorRow = 2); //determine which row of factors we'll work with for this row range of data
For( j = 1, j < ncolBreaks, j++,
lowCol = colBreaks[j];
highCol = if ( j == ncolBreaks - 1, colbreaks[j+1], colbreaks[j+1] -1 ); //we don't need to subtract 1 off the last upper bound.
For( c = lowCol, c <= highCol, c++,
dt1[rowRange, c] *= dt2[factorRow, c - 1] // -1 because dt1 has an ID column while dt2 does not
);
);
);
dt1 << bring window to front;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
Hi
I have two matrices to multiple together, for example, the first one is 3X4 matrix and the second is 4X2 matrix. Can anybody suggest how can perform this matrix multiplication in JMP Pro 16?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: How do I multiply two tables together
Here is an example
mat1 = [ 1 2 3, 4 5 6, 7 8 9, 10 11 12];
mat2 = [1 2 3 4, 5 6 7 8, 9 10 11 12];
finalMat = mat1 * mat2;
Here is an example where the data are in data tables
names default to here(1);
dt1 = data table("one");
dt2 = data table("two");
finalMat=dt1[0,0]*dt2[0,0];