Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Matrix vs Data Table

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 21, 2018 2:51 PM
(1115 views)

All,

Wondering if someone has done similar trails to time matrix vs data table.

```
Clear Log(); Clear Globals(); Close All(DataTables,"No Save");
// Inputs
a = 7;
// Approach 1
TimerStart_1 = Tick Seconds();
dt1 = New Table("Approach-1","Invisible");
dt1 << New Column("Random-1",Numeric,Continuous,<< Set Values(Random Index(10^8,10^a)))
<< New Column("Random-2",Numeric,Continuous,<< Set Values(Random Index(10^8,10^a)))
<< New Column("Add",Numeric,Continuous,Formula(:Name("Random-1")+:Name("Random-2")))
<< New Column("Subtract",Numeric,Continuous,Formula(:Name("Random-1")-:Name("Random-2")))
<< New Column("Multiply",Numeric,Continuous,Formula(:Name("Random-1")*:Name("Random-2")))
<< New Column("Mod",Numeric,Continuous,Formula(Mod(:Name("Random-1"),:Name("Random-2"))));
TimerEnd_1 = Tick Seconds();
Show(TimerEnd_1 - TimerStart_1);
Close All(DataTables,"No Save");
// Approach 2
TimerStart_2 = Tick Seconds();
Mat_1 = Random Index(10^8,10^a);
Mat_2 = Random Index(10^8,10^a);
Add = Mat_1 + Mat_2 ;
Difference = Mat_1 - Mat_2 ;
Prod =E Mult (Mat_1,Mat_2);
Mod = Mod(Mat_1,Mat_2);
TimerEnd_2 = Tick Seconds();
Show(TimerEnd_2 - TimerStart_2);
```

Making the data table private, might shave some more time, but in general, Is this fair or does aybody favor one data container over another solely for speed ?

Best

Uday

Uday

- Tags:
- optimization

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 21, 2018 6:10 PM
(1580 views)
| Posted in reply to message from uday_guntupalli 05/21/2018 05:51 PM

You have 2 questions: is this fair? and do you favor one over another?

Regarding fair, memory management is all up to the language provider. Keep in mind tables have more methods than matrices. I would have phrased your first questions as, "Is this change in preformance between 1 million and 10 million expected?" Also, I am just guessing that after 1 million rows, JMP might be doing some storage compression, in other words saving memory, the trade-off being time.

What I favor depends upon the task. If I am doing a simulation, where a method is applied numerous times and I am getting summary performance, then I'll use a matrix. Prior to JMP 13 which has better subtable referencing, if I had large tables, I would work with matrices then set the values to the table specifically for performance, and I would __never__ use formulas for large tables.

I have attached a script that appends to your script a third method that uses the JMP 13 subtable referencing syntax. The snippet below is the syntax for the Add column. [This site would not allow me to post the attachment, so go to the end to see the fill script.]

` dt1[0, "Add"] = dt1[0,"Random-1"]+ dt1[0,"Random-2"] ;`

Note since you are interested in performance, try this,

```
a=5;
tb1 = TickSeconds();
Mat_1 = Random Index(10^8,10^a);
te1 = TickSeconds();
tb2 = TickSeconds();
Mat_2 = Round (J(10^a, 1, Random Uniform(10^8) )*10^8,0);
te2 = TickSeconds();
show(te1-tb1, te2-tb2);
```

For a<7, the second method is far superior to method 1.

a = 6: te1 - tb1 = 2.43333333334886; te2 - tb2 = 0.25; //second mehod superior

a = 7: te1 - tb1 = 2.48333333339542; te2 - tb2 = 2.56666666665114; //both methods the same

a = 8: te1 - tb1 = 3.03333333344199; te2 - tb2 = 25.8166666666511; //second method much worse

And like you when there is a big difference, I send a note to JMP as an FYI.

For people running the script beware it is closing all tables, etc. Run in a new session of JMP.

```
Clear Log(); Clear Globals(); Close All(DataTables,"No Save");
// Inputs
a = 7;
// Approach 1
TimerStart_1 = Tick Seconds();
dt1 = New Table("Approach-1","Invisible");
dt1 << New Column("Random-1",Numeric,Continuous,<< Set Values(Random Index(10^8,10^a)))
<< New Column("Random-2",Numeric,Continuous,<< Set Values(Random Index(10^8,10^a)))
<< New Column("Add",Numeric,Continuous,Formula(:Name("Random-1")+:Name("Random-2")))
<< New Column("Subtract",Numeric,Continuous,Formula(:Name("Random-1")-:Name("Random-2")))
<< New Column("Multiply",Numeric,Continuous,Formula(:Name("Random-1")*:Name("Random-2")))
<< New Column("Mod",Numeric,Continuous,Formula(Mod(:Name("Random-1"),:Name("Random-2"))));
TimerEnd_1 = Tick Seconds();
Show(TimerEnd_1 - TimerStart_1);
Close All(DataTables,"No Save");
// Approach 2
TimerStart_2 = Tick Seconds();
Mat_1 = Random Index(10^8,10^a);
Mat_2 = Random Index(10^8,10^a);
Add = Mat_1 + Mat_2 ;
Difference = Mat_1 - Mat_2 ;
Prod =E Mult (Mat_1,Mat_2);
Mod = Mod(Mat_1,Mat_2);
TimerEnd_2 = Tick Seconds();
Show(TimerEnd_2 - TimerStart_2);
// Approach 3
TimerStart_3 = Tick Seconds();
dt1 = New Table("Approach-3","Invisible", add rows(10^a),
New Column("Random-1",Numeric,Continuous ),
New Column("Random-2",Numeric,Continuous ),
New Column("Add",Numeric,Continuous ),
New Column("Subtract",Numeric,Continuous ),
New Column("Multiply",Numeric,Continuous ),
New Column("Mod",Numeric,Continuous)
);
// Column(dt1, "Random-1") << Set Values( Random Index(10^8,10^a) );
// Column(dt1, "Random-2") << Set Values( Random Index(10^8,10^a) );
// Column(dt1, "Add") << Set Values( dt1[0,"Random-1"]+ dt1[0,"Random-2"] );
// Column(dt1, "Subtract") << Set Values( dt1[0,"Random-1"]- dt1[0,"Random-2"] );
// Column(dt1, "Multiply") << Set Values( dt1[0,"Random-1"]:* dt1[0,"Random-2"] );
// Column(dt1, "Mod") << Set Values( Mod(dt1[0,"Random-1"], dt1[0,"Random-2"]) );
dt1[0,"Random-1"] = Random Index(10^8,10^a) ;
dt1[0,"Random-2"] = Random Index(10^8,10^a) ;
dt1[0, "Add"] = dt1[0,"Random-1"]+ dt1[0,"Random-2"] ;
dt1[0, "Subtract"] = dt1[0,"Random-1"]- dt1[0,"Random-2"] ;
dt1[0, "Multiply"] = dt1[0,"Random-1"]:* dt1[0,"Random-2"] ;
dt1[0, "Mod"] = Mod(dt1[0,"Random-1"], dt1[0,"Random-2"] );
TimerEnd_3 = Tick Seconds();
Show(TimerEnd_3 - TimerStart_3);
Close All(DataTables,"No Save");
```

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

May 21, 2018 6:10 PM
(1581 views)
| Posted in reply to message from uday_guntupalli 05/21/2018 05:51 PM

You have 2 questions: is this fair? and do you favor one over another?

Regarding fair, memory management is all up to the language provider. Keep in mind tables have more methods than matrices. I would have phrased your first questions as, "Is this change in preformance between 1 million and 10 million expected?" Also, I am just guessing that after 1 million rows, JMP might be doing some storage compression, in other words saving memory, the trade-off being time.

What I favor depends upon the task. If I am doing a simulation, where a method is applied numerous times and I am getting summary performance, then I'll use a matrix. Prior to JMP 13 which has better subtable referencing, if I had large tables, I would work with matrices then set the values to the table specifically for performance, and I would __never__ use formulas for large tables.

I have attached a script that appends to your script a third method that uses the JMP 13 subtable referencing syntax. The snippet below is the syntax for the Add column. [This site would not allow me to post the attachment, so go to the end to see the fill script.]

` dt1[0, "Add"] = dt1[0,"Random-1"]+ dt1[0,"Random-2"] ;`

Note since you are interested in performance, try this,

```
a=5;
tb1 = TickSeconds();
Mat_1 = Random Index(10^8,10^a);
te1 = TickSeconds();
tb2 = TickSeconds();
Mat_2 = Round (J(10^a, 1, Random Uniform(10^8) )*10^8,0);
te2 = TickSeconds();
show(te1-tb1, te2-tb2);
```

For a<7, the second method is far superior to method 1.

a = 6: te1 - tb1 = 2.43333333334886; te2 - tb2 = 0.25; //second mehod superior

a = 7: te1 - tb1 = 2.48333333339542; te2 - tb2 = 2.56666666665114; //both methods the same

a = 8: te1 - tb1 = 3.03333333344199; te2 - tb2 = 25.8166666666511; //second method much worse

And like you when there is a big difference, I send a note to JMP as an FYI.

For people running the script beware it is closing all tables, etc. Run in a new session of JMP.

```
Clear Log(); Clear Globals(); Close All(DataTables,"No Save");
// Inputs
a = 7;
// Approach 1
TimerStart_1 = Tick Seconds();
dt1 = New Table("Approach-1","Invisible");
dt1 << New Column("Random-1",Numeric,Continuous,<< Set Values(Random Index(10^8,10^a)))
<< New Column("Random-2",Numeric,Continuous,<< Set Values(Random Index(10^8,10^a)))
<< New Column("Add",Numeric,Continuous,Formula(:Name("Random-1")+:Name("Random-2")))
<< New Column("Subtract",Numeric,Continuous,Formula(:Name("Random-1")-:Name("Random-2")))
<< New Column("Multiply",Numeric,Continuous,Formula(:Name("Random-1")*:Name("Random-2")))
<< New Column("Mod",Numeric,Continuous,Formula(Mod(:Name("Random-1"),:Name("Random-2"))));
TimerEnd_1 = Tick Seconds();
Show(TimerEnd_1 - TimerStart_1);
Close All(DataTables,"No Save");
// Approach 2
TimerStart_2 = Tick Seconds();
Mat_1 = Random Index(10^8,10^a);
Mat_2 = Random Index(10^8,10^a);
Add = Mat_1 + Mat_2 ;
Difference = Mat_1 - Mat_2 ;
Prod =E Mult (Mat_1,Mat_2);
Mod = Mod(Mat_1,Mat_2);
TimerEnd_2 = Tick Seconds();
Show(TimerEnd_2 - TimerStart_2);
// Approach 3
TimerStart_3 = Tick Seconds();
dt1 = New Table("Approach-3","Invisible", add rows(10^a),
New Column("Random-1",Numeric,Continuous ),
New Column("Random-2",Numeric,Continuous ),
New Column("Add",Numeric,Continuous ),
New Column("Subtract",Numeric,Continuous ),
New Column("Multiply",Numeric,Continuous ),
New Column("Mod",Numeric,Continuous)
);
// Column(dt1, "Random-1") << Set Values( Random Index(10^8,10^a) );
// Column(dt1, "Random-2") << Set Values( Random Index(10^8,10^a) );
// Column(dt1, "Add") << Set Values( dt1[0,"Random-1"]+ dt1[0,"Random-2"] );
// Column(dt1, "Subtract") << Set Values( dt1[0,"Random-1"]- dt1[0,"Random-2"] );
// Column(dt1, "Multiply") << Set Values( dt1[0,"Random-1"]:* dt1[0,"Random-2"] );
// Column(dt1, "Mod") << Set Values( Mod(dt1[0,"Random-1"], dt1[0,"Random-2"]) );
dt1[0,"Random-1"] = Random Index(10^8,10^a) ;
dt1[0,"Random-2"] = Random Index(10^8,10^a) ;
dt1[0, "Add"] = dt1[0,"Random-1"]+ dt1[0,"Random-2"] ;
dt1[0, "Subtract"] = dt1[0,"Random-1"]- dt1[0,"Random-2"] ;
dt1[0, "Multiply"] = dt1[0,"Random-1"]:* dt1[0,"Random-2"] ;
dt1[0, "Mod"] = Mod(dt1[0,"Random-1"], dt1[0,"Random-2"] );
TimerEnd_3 = Tick Seconds();
Show(TimerEnd_3 - TimerStart_3);
Close All(DataTables,"No Save");
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Matrix vs Data Table

@gzmorgan0,

Thank you for your detailed response. Your interpretation is mostly accurate and I wish I provided more clarity to begin with. I agree and share your preferences between data tables and matrices, would use data tables if I needed more built in methods vs matrices. However, the question I wanted to pick the communities brain on was the speed of handling large data and if and why does the behavior change as the data size increases. I would like to believe it is because of the storage compression that you are referring to.

One interesting aspect is the amount of time that is saved via data table sub-scripting. At a = 7, it was shaving a good 7 seconds w.r.t to the traditional column formula approach - with matrices still leading in terms of performance.

Best

Uday

Uday

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Matrix vs Data Table

May 22, 2018 5:53 AM
(1074 views)
| Posted in reply to message from uday_guntupalli 05/21/2018 05:51 PM

"Making the data table private, might shave some more time"

- actually, you might find that making the table private has a substantial impact on performance

-Dave

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Matrix vs Data Table

May 22, 2018 6:07 AM
(1071 views)
| Posted in reply to message from David_Burnham 05/22/2018 08:53 AM

@David_Burnham,

While I agree, I am generating multiple tables iteratively that the reference gets overwritten and hence making the data table private might result in loss of the table. But in general, I agree and follow the approach of making the data table private where ever possible.

Best

Uday

Uday