cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
Data visualization with t-SNE and UMAP
MJ
MJ
Level IV

Description

Recently, non-linear dimension-reduction and visualization algorithms, most notably t-Distributed Stochastic Neighbor Embedding (t-SNE) and uniform manifold approximation and projection (UMAP), have been widely applied to various research areas such as image processing, text mining, and genomics. This Add-in provides access to both t-SNE and UMAP R packages. It offers a user-friendly interface enabling data table navigation, data quality control, sparsity handling, intuitive parameterization, and interactive results interpretation.

 

Usage Example

Here is a screenshot of the interface with MNIST data loaded. Under Model Specifications, I selected the label column as Label and all the pixels as predictors. I chose both t-SNE and UMAP as the algorithms.

 

Embedding_Interface.png

 

Another screenshot of the results of both t-SNE (top) and UMAP (bottom).

 

tSNE&UMAP.png

 

This add-in also supports some basic quality control options, including missing value checking, distribution, and sparsity calculation. You can find these options under Quality Control Options on the interface.

 

Updates: JMP R interface on Mac has versioning issues. Please downgrade your R to version <=3.3.3, and use t-sne only if you are a Mac user. 

 

Changelog:

v.2 <9/22/2020> Added tsne and umap results to the original table and reorganized output for better exploration experience.

v.1.2 <3/14/2019> Fixed an issue in Rtsne package where a large number of columns causing stack overflow problem.
v.1.1 <3/8/2019> Fixed a bug that could potentially produce “issues found in R, memory exhausted?” error message. Added a submenu for the MNIST example dataset.

v.1.0 <2/26/2019> Initial version.

Comments
FN

Hello MJ,

 

Would it be possible to get the source code for the addin?

 

I would like to use the same UI but with Python packages as backend.

 

Thank you.

MJ

@FN Yes, the add-in is basically a zipped file. You should be able to unzip it with any decompression tool and find the embedding.jsl in the script folder. Modifying the  talk2R() function in the script should do it.

Hello, 

I am getting this error. I've re-installed R 4.03 and UMAP in the cran package. The connection is there, but something isn't connecting. 

MikeDereviankin_0-1605560671955.png

 

{"UMAP"}
{3, 15, 200, 0.1}
"Got algr!"
"Got parameters!"
"Start inDataPrep"
{"2,3,7,8-TCDD", "1,2,3,7,8-PeCDD", "1,2,3,4,7,8-HxCDD", "1,2,3,6,7,8-HxCDD", "1,2,3,7,8,9-HxCDD", "2,3,7,8-TCDF", "1,2,3,7,8-PeCDF", "2,3,4,7,8-PeCDF", "1,2,3,4,7,8-HxCDF", "1,2,3,6,7,8-HxCDF"}
"Finish Sparsity"
"Sparsity is: "
0.280112044817927
"Label is: "
{}
"Dim of inData2R is: "
119
15
"Finished data preparation."
"Start backend!"
"Label is: "
{}
"Within talk2R function."
"label is:"
{}
"UMAP is selected!"

TKIntRJMP.R version 14.0
The final R statement is incomplete.
Send data for "data4R" failed

 

MJ

@MikeDereviankin If this error happened after you updated your R and UMAP package, then that could be the reason. Some JMP releases only work well with specific R versions, for example, R 3.5.2 works well with most of the JMP releases. You may need to consider downgrading your R and try again.

@MJ Can I have two different R's set up on my computer and have the JMP script go to that specific version of R? 

MJ

@MikeDereviankin Yes, please see the following information. 

Please try to set R_HOME as an environmental variable as following and try this add-in again. Open Window CMD console and type: setx R_HOME "This PC\Documents\R\R-3.5.2". And make sure Rtsne and umap packages are installed to this version of R. Please let me know if this solves your problem.
Also, we have a few threads on the community talking about this issue that you can check out.
https://community.jmp.com/t5/Administration-Discussions/Help-JMP-find-R-installation/td-p/6357 
https://community.jmp.com/t5/Discussions/Setting-path-to-R-location/td-p/59764 

@MJ 

This pops up now similar as before: 

MikeDereviankin_0-1605888646646.png


Are there admin permissions I need to give? 

Gave admin purposes and now get the same error as before, "sendRdata". 

I had previously used this with R-4.0.2 and all of sudden last week it doesn't work. UMAP works fine in R itself. 

I have set R home to the previous version of R and the cran library installs packages on this version too. The problem is UMAP is not compatible with 3.5.2. from the looks of it. 

As mentioned previously, I had been working with 4.0.2 previously and everything worked out just fine. 

MJ

@MikeDereviankin I was not able to reproduce this problem. I was using the latest R 4.0.3, JMP 15.1, and the latest UMAP and tSNE, both algorithms worked fine. I think what went wrong was the R send () command, which could be caused by failed communication with a specific R version or unexpected data problems. Have you tried the add-in with the example MNIST dataset? Other things you can try include updating R and UMAP again, removing additional R installations, and updating JMP to newer versions if possible. You can also send me your data set for me to take a look if it's not confidential (Meijian.Guan@jmp.com). And contacting technical support is always a good option too.

rexneal

MJ

Your UMAP add-in works great. I use it almost every day. Thanks for the info on perplexity.

When I repeat the UMAP on the same data, I always get a similar but slightly different parameters.

It is not significant as the plots are always very close. But a couple of samples will vary a bit.

Why is this. Is it the random seed that changes?

 

Thanks so much. Great add-inn

 

Neal

FR60

Hi MJ. 

I'm trying to use your addin but even though I followed your suggestion about the R version to set as Home I still receive the following Alert msg: 

 

Cannot get R output.

 

Please can help me?

 

Felice 

Hi

 

I am trying to use the addin but get the same response "Cannot get R output"  from JMP 16.2.0 + R 4.1.2

 

= Same problem as FR60 

 

I had one of the earlier complaints that said it couldn't find R so I blew the old R away and installed the current version R 4.1.2 which generates the error above.


So it is not at all clear wherein the problem lies.  With the wide application of these tools it seems that it is about time that SAS needs to get this capability incorporated into JMP!

 

@MJ Have you been able to help MacOS users with R integration? It works great on my Windows. Any possibility of establishing this connection without downgrading my R? I use the integration often with new packages. This is for a M1 Macbook is that makes any difference. 

Good evening, dear colleagues!

I have installed  R 3.6.3 and RStudio on my Windows 10 (64 bit). Installed both packages successfully and the example file for the add-in worked out well properly visualizing the data. However, as long as I launched my file (32 rows, 7 columns of the predictors, 1 column as "label"), the process resulted in an alert below.

Nazarkovsky_0-1671139799243.png

Could you please, explain, what was wrong. All the columns were numerical-continuous like in the example, MNIST.jmp. The only difference was that the numbers in 7 variables were non-integer: like 0.234234 ou 0.472343.

I will highly appreciate your advice. 
Michael

@Nazarkovsky  - have a look at the JMP 17 Pro Docs (https://www.jmp.com/support/help/en/17.1/index.shtml#page/jmp/multivariate-embedding.shtml%23).  T-SNE is now part of the Analyze > Multivariate > Multivariate Embedding platform in pro.  

 

Best,

 

M

Thanks, @MikeD_Anderson !
Wow, it is true. Pity that UMAP was not included together with TSNE.


@Nazarkovsky  - put it in the wish list!  I agree UMAP should be in there.  If you look at how they set up the GUI, it’s pretty clear it was designed for the addition of other methods down the road.  

M