turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Overlay Graphs

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Aug 18, 2015 2:56 PM
(7128 views)

Hi,

I've run into a problem when it comes to stacking columns. The data that I look at can get into the hundreds of columns of data. Often times I stack columns together so I can use a distribution graph on that one column. But sometimes my data table gets into the millions of rows of data and it starts to crash when I stack columns. I was wondering if there is a way I can overlay a bunch of distribution graphs to make one graph without having to stack all the columns together? Like graph a ton of columns and overlay all those graphs into one?

Thank you for any help!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Within the hardware constraints of your machine, generally JMP is good at this kind of thing. I assume that (for some reason) it's not possible to import the data in a single column.

You can experiment with the code below. On my machine, this takes eight seconds, but more than half of that time is spent building the table to start from, and evaluating the formulae therein. Setting nr = 1,000,0000 gives a run time of about one minute.

```
NamesDefaultToHere(1);
// Number of columns and rows
nc = 130;
nr = 100000;
// Make a test table, dt1, to work with
dt1 = NewTable("Test", NewColumn("Column 1", Numeric, Continuous, Formula(RandomNormal())),AddRows(nr));
For(i=2, i<=nc, i++, dt1 << NewColumn("Column "||Char(i), Numeric, Continuous, Formula(RandomNormal())));
// Get the data from the table into a matrix
m = dt1 << getAsMatrix;
// Close dt1 to recover some memory
Close(dt1, NoSave);
// Reshape into a column vector
m = Shape(m, NRow(m)*NCol(m), 1);
// Make a new table
dt2 = NewTable("Test in a Single Column", NewColumn("All", Numeric, Continuous, Values(m)));
// Recover memory
m = [];
```

8 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

See if Graph Builder will do what you want. Select all the columns and drop them on the Y axis. You'll see a point cloud. Here's Big Class using age, height, weight:

then right-click->add->histogram:

then right-click->points->remove, click DONE

click the titles and press delete (Or add a better title)

then you can use the red triangle to copy the script:

Graph Builder(

Size( 534, 418 ),

Show Control Panel( 0 ),

Variables( Y( :age ), Y( :height, Position( 1 ) ), Y( :weight, Position( 1 ) ) ),

Elements( Histogram( Y( 1 ), Y( 2 ), Y( 3 ), Legend( 3 ) ) ),

SendToReport(

Dispatch( {}, "graph title", TextEditBox, {Set Text( "Students" )} ),

Dispatch( {}, "X title", TextEditBox, {Set Wrap( 2 )} ),

Dispatch( {}, "Y title", TextEditBox, {Set Text( "" )} ),

Dispatch(

{},

"Graph Builder",

FrameBox,

{DispatchSeg(

Hist Seg( "Histogram (age)" ),

Histogram Color( -4222943 )

), DispatchSeg(

Hist Seg( "Histogram (height)" ),

Histogram Color( -13977687 )

), DispatchSeg(

Hist Seg( "Histogram (weight)" ),

Histogram Color( -3780931 )

)}

)

)

)

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Aug 19, 2015 9:54 AM
(6114 views)
| Posted in reply to message from Craige_Hales 08/19/2015 10:05 AM

That is helpful, but it's not quite what I'm looking for. I want it to essentially look exactly like the distribution graph (where you select a column and graph it) except that I want to be able to select multiple columns and create a distribution. I just want one graph with the values along the bottom, and how much data on the y-axis. Right now I have to stack all the columns I want to graph together and then graph that new column (which is exactly what I want). The only problem is I have about 130 columns I want to graph together and stacking them is unreasonable, especially when my data table is usually in the 100s of thousands of rows and stacking two columns together duplicates the amount of rows; stacking 130 would crash JMP or take an enormous amount of time. The graph builder essentially tells me nothing because there is no x-axis in this case, so I don't know how much each bar represents or anything.

Is there a way to overlay the distribution graphs to make one graph of all the columns instead of separating out every column?

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

This should stack your data without duplicating too much. It deletes the label column since you don't care which original column the values came from. ** DropAllOtherColumns** may be the part you are looking for.

newdt = Data Table( "Big Class" ) << Stack(

columns( :height, :weight, :age ),

Source Label Column( "Label" ),

Stacked Data Column( "Data" ),

Drop All Other Columns( 1 ),

Output Table( "hw" )

);

newdt << deletecolumns( "label" );

newdt << Distribution( Continuous Distribution( Column( :Data ) ) );

Craige

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

The rule of thumb that I've heard about JMP is to have twice as much memory available as your dataset size. So maybe more memory is needed?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I don't think that kind of overlay is possible in the distribution platform. In my experience stacking is quite fast. The below example code runs in a few seconds on my laptop. If increasing the number of data to 10^8 it slows down, but it's the distribution platform that's the bottleneck here, not the stacking.

On another note; a histogram of millions of data looks not much different from a smaller random subset of the data. So if performance is a problem you could try to downsize the data set before visualization (or you may simply need more RAM).

```
dt = New Table("big");
nr = 100000;
nc = 100;
dt << add rows(nr);
For(i = 1, i <= nc, i++,
dt << New Column("col" || Char(i), set each value(Random Normal()))
);
dt_stacked = dt << Stack(
columns(dt << get column names),
Source Label Column("Label"),
Stacked Data Column("Data"),
Drop All Other Columns(1)
);
dt_stacked << Distribution(Continuous Distribution(Column(:Data), Outlier Box Plot(0)));
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

sophiaw: In JMP version 12 you can create the visualization below (BigClass) Height by Sex (overlay).

I think this is the visualization you are after? But unfortunately you have to stack the Overlay variable and you say that is problematic from a data table management point of view. How much RAM do you possess? Maybe adding more RAM can help with the crashing? All that aside, with as many levels of the overlay variable as you suggest I'm just wondering just how visually aesthetically appealing your graph will be? Let alone being able to really tell which level is which. The histogram overlay visualization is really most effective with just a relatively small number of levels for the overlay variable.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Thank you for all of your suggestions!

Unfortunately, it's not really what I'm looking for. It seems that JMP just can't do what I want. Because I don't want to have to stack all the columns together, but I want the distribution graph of all the data.

DropAllOtherColumns is a useful function I did not know about. I'll have to play around with that to see if stacking my columns and adding that functions improves efficiency.

Also, Peter, I don't care about seeing what value corresponds to which level in the distribution, which is why I want something equivalent to stacking, but without the memory sucking aspect. But that distribution graph looks a lot like what I'm after, unfortunately I don't have JMP version 12. I also have 130 variables. haha

Thank you!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Within the hardware constraints of your machine, generally JMP is good at this kind of thing. I assume that (for some reason) it's not possible to import the data in a single column.

You can experiment with the code below. On my machine, this takes eight seconds, but more than half of that time is spent building the table to start from, and evaluating the formulae therein. Setting nr = 1,000,0000 gives a run time of about one minute.

```
NamesDefaultToHere(1);
// Number of columns and rows
nc = 130;
nr = 100000;
// Make a test table, dt1, to work with
dt1 = NewTable("Test", NewColumn("Column 1", Numeric, Continuous, Formula(RandomNormal())),AddRows(nr));
For(i=2, i<=nc, i++, dt1 << NewColumn("Column "||Char(i), Numeric, Continuous, Formula(RandomNormal())));
// Get the data from the table into a matrix
m = dt1 << getAsMatrix;
// Close dt1 to recover some memory
Close(dt1, NoSave);
// Reshape into a column vector
m = Shape(m, NRow(m)*NCol(m), 1);
// Make a new table
dt2 = NewTable("Test in a Single Column", NewColumn("All", Numeric, Continuous, Values(m)));
// Recover memory
m = [];
```