Choose Language Hide Translation Bar
MJ
Staff MJ
Staff

Data visualization with t-SNE and UMAP

Description

Recently, non-linear dimension-reduction and visualization algorithms, most notably t-Distributed Stochastic Neighbor Embedding (t-SNE) and uniform manifold approximation and projection (UMAP), have been widely applied to various research areas such as image processing, text mining, and genomics. This Add-in provides access to both t-SNE and UMAP R packages. It offers a user-friendly interface enabling data table navigation, data quality control, sparsity handling, intuitive parameterization, and interactive results interpretation.

 

Usage Example

Here is a screenshot of the interface with MNIST data loaded. Under Model Specifications, I selected the label column as Label and all the pixels as predictors. I chose both t-SNE and UMAP as the algorithms.

 

Embedding_Interface.png

 

Another screenshot of the results of both t-SNE (top) and UMAP (bottom).

 

tSNE&UMAP.png

 

This add-in also supports some basic quality control options, including missing value checking, distribution, and sparsity calculation. You can find these options under Quality Control Options on the interface.

 

Updates: JMP R interface on Mac has versioning issues. Please downgrade your R to version <=3.3.3, and use t-sne only if you are a Mac user. 

 

Changelog:

v.1.2 <3/14/2019> Fixed an issue in Rtsne package where a large number of columns causing stack overflow problem.
v.1.1 <3/8/2019> Fixed a bug that could potentially produce “issues found in R, memory exhausted?” error message. Added a submenu for the MNIST example dataset.

v.1.0 <2/26/2019> Initial version.

Comments

Hi! MJ,
Can I understand that, in above photos, each mass/group of points can be treated as the same attribute? or different color points should be treated as different attribute?

Hi Mujahida,
The colors indicate true labels for this dataset, so data points with the same color should have similar attribute. The clusters were estimated by t-SNE and UMAP, you should see some color mismatches.

Hi MJ, 

 

I am getting an error message when I try this addin. It looks like it is not recognizing my installation of R. Would you be able to help figure out how to get your addin to recognize my R installation if that is indeed the issue?

 

I am including a screenshot showing that R is open but not being recognized, and a screenshot of my R install location (not program files) and also copying some of the error message below.

 

This looks like a very exciting tool and I am hoping to use it. I've previously found t-sne in R to be useful and it would be of great value to be able to do this right in JMP. Any help you might provide is greatly appreciated.

 

Thanks!

 

Screenshots of error and open R instance, then screenshot of R install location

t-sne addin not recognizing R install.pngr install location.png

 

Error code

 

"

An installation of R cannot be found on this system. JMP R support requires R version 2.9.1 or higher in access or evaluation of 'Glue' , write2lastRun( pb2, _addinPath_ );
algr = cbb << get selected();
Print( "Got algr!" );
Try(
dim1 = dimBx1 << get;
per1 = perBx1 << get;
iter1 = iterBx1 << get;
);
Try(
dim2 = dimBx2 << get;
per2 = perBx2 << get;
iter2 = iterBx2 << get;
dist = distbox << get;
);
Print( "Got parameters!" );
Try( predictor = selectedX << getitems );
If( Length( predictor ) < 1,
Throw( "Please specify Predictors" )
);
Try( labelY = selectedY << getitems );
If( N Items( labelY ) > 0,
labelY1 = labelY[1];
grphVars = Eval Insert( "X( :Y2 ), Y( :Y1 ), Color( :^labelY1^ ) " );
,
labelY1 = "";
grphVars = "X( :Y2 ), Y( :Y1 )";

***** Text Truncated *****"

 

 

Hi marxx,
Your problem was likely caused by multiple installations of R on your machine and JMP couldn't decide which one to use. Please try to set R_HOME as an environmental variable as following and try this add-in again. Open Window CMD console and type: setx R_HOME "This PC\Documents\R\R-3.5.2". And make sure Rtsne and umap packages are installed to this version of R. Please let me know if this solves your problem.
Also, we have a few threads on the community talking about this issue that you can check out.
https://community.jmp.com/t5/Administration-Discussions/Help-JMP-find-R-installation/td-p/6357
https://community.jmp.com/t5/Discussions/Setting-path-to-R-location/td-p/59764


Hello @MJ,

I tried running your sample data, but had a access violation crash. In the log file, it looks like the R-side ran successfully. Any ideas how I can address the issue?

 

Thanks,

Al

 

Embedding - JMP window.jpgJMP error message.jpglog file.jpg

Hi @ngalphie, this looks like a problem with the older version of JMP.

1. I noticed that you are using JMP 13, it might be helpful if you can update it to the latest version (JMP 14.3) and try this add-in again.

2. Send the crash report saved at C:\Users\username\AppData\Local\Temp\JmpCrashReports\13 to me to tech support (https://support.sas.com/ctx/supportform/createForm?ctry=us_JMP) or me directly (Meijian.Guan@jmp.com), we can dig into it.

3. There are a few posts on the community also talking about this issue. You can take a look to see if there are anything they mentioning could help.

https://community.jmp.com/t5/Discussions/JMP-has-performed-an-access-violation-and-will-shut-down-wh...

https://community.jmp.com/t5/Discussions/JMP-Access-Violation-Likely-Causes/m-p/5648#M5647

Thank you very much for providing an interface for JMP.

 

I wonder if these addins can include the R/Python executables so we can run them directly.

 

 

@FN Hey there, thank you for your suggestions. It would be a great option to include R/Python executables but due to our policies and legal concerns, I did not include them. Please let me know if you have any problems regarding R installations or versions when using this add-in. 

 

Thank you. To be honest, I am not sure what is the best way to install R. I am used to manage Python installations with conda/anaconda, which also includes R now.

 

This is the path where I have R installed.

 

(base) C:\Users\john_doe>where r
C:\Users\john_doe\AppData\Local\Continuum\miniconda3\Scripts\R.exe

 

I guess I need to install these packages

https://anaconda.org/conda-forge/r-tsne

https://anaconda.org/conda-forge/r-umap

 

To make JMP able to find my R installation, I exectue this (or change the PATH manually):

setx R_HOME "C:\Users\john_doe\AppData\Local\Continuum\miniconda3\Scripts\"

 

If there is a detailed guide on how to do this better, please let me know.

 

 

I managed to run umap but not via Anaconda/conda.

 

I think I am installing the wrong package for tsne. Can you provide the URL in cran?

 

Here is the step by step.

 

Intall R from https://cran.r-project.org/

Install Rstudio commnutiy https://www.rstudio.com/products/rstudio/download/#download

Use Rstudio to install tsne and umap.

Rinstallation.png

 

Hello, @FN, it looks like you installed a different version of t-sne package. Could you please try to install Rtsne through R studio instead? The github version for this package is here: https://github.com/jkrijthe/Rtsne. Also please make sure your R_HOME path is pointing to the right R version with UMAP and Rtsne installed. Let me know if you have further questions.

Dear @MJ 

thanks for this nice add-in. I used it on a Mac tSNE with R Version 3.3.3 worked fine. UMAP did not.

Now I checked it on Windows using the latest R Version 3.6.1 with RStudio. I installed the packages for umap and Rtsne unfortunately neither umap nor Rtsne worked. I use JMP 14.3.

Here is what log says when I try the mnist data after PCA (using 2 PCs as predictors):

 

{"UMAP"}
{2, 3, 200, 0.1}
"Got algr!"
"Got parameters!"
"Dim of inData2R is: "
10000
3
"Start backend!"
"UMAP is selected!"

TKIntRJMP.R version 14.0
label is: label
dim of inDataUniq is: 10000 3
Ready for Run
We are running UMAP
An exception of type c0000005 occurred at address 6c910ef2 while processing the submitted R statements. This address is at offset 10ef2 into module "C:\Program Files\R\R-3.6.1\bin\i386\R.dll"
An exception of type c0000005 occurred at address 6c910ef2 while processing the submitted R statements. This address is at offset 10ef2 into module "C:\Program Files\R\R-3.6.1\bin\i386\R.dll"
Issues found in R, could be caused by unsuccessful installation of Rtsne/umap packages or limited memory.

{"t-SNE"}
{2, 5, 500}
"Got algr!"
"Got parameters!"
"Dim of inData2R is: "
10000
3
"Start backend!"
"t-SNE is selected!"

TKIntRJMP.R version 14.0
label is: label
dim of inDataUniq is: 10000 3
Ready for Run
We are running t-SNE
dim of inDataTsne is: 10000 2
Read the 10000 x 2 data matrix successfully!
OpenMP is working. 1 threads.
Using no_dims = 2, perplexity = 5.000000, and theta = 0.500000
Computing input similarities...
Building tree...
- point 10000 of 10000
Done in 0.45 seconds (sparsity = 0.001695)!
Learning embedding...
Iteration 50: error is 120.349392 (50 iterations in 2.08 seconds)
Iteration 100: error is 103.025177 (50 iterations in 1.94 seconds)
Iteration 150: error is 94.995130 (50 iterations in 1.70 seconds)
Iteration 200: error is 90.828601 (50 iterations in 1.75 seconds)
Iteration 250: error is 87.964178 (50 iterations in 1.81 seconds)
Iteration 300: error is 4.069311 (50 iterations in 1.82 seconds)
Iteration 350: error is 3.474366 (50 iterations in 1.83 seconds)
Iteration 400: error is 3.044447 (50 iterations in 1.84 seconds)
Iteration 450: error is 2.719790 (50 iterations in 1.83 seconds)
Iteration 500: error is 2.466420 (50 iterations in 1.84 seconds)
Fitting performed in 18.44 seconds.
[,1] [,2]
[1,] -18.4173747 30.133875
[2,] -23.5082997 -17.985255
[3,] 0.7376548 -4.148267
[4,] -5.6881024 6.385878
[5,] 8.8457460 22.245393
[6,] -10.8212058 -17.013940
[1] "Analysis done!"
An exception of type c0000005 occurred at address 6c910ef2 while processing the submitted R statements. This address is at offset 10ef2 into module "C:\Program Files\R\R-3.6.1\bin\i386\R.dll"
Issues found in R, could be caused by unsuccessful installation of Rtsne/umap packages or limited memory.

 

Would be great if you would have an idea what I should try. Thanks in advance and best regards Patrick

Hi Meijian,

 

Good afternoon. I tried to use t-SNE and UMAP on the sample data. I use the latest version of R Studio and JMP 13. But I got the following error-

ERROR JMP.png

 

Please me know how can I fix it. Thanks for your help.

PS. I won't be able to upgrade to JMP 14 at this moment.

 

Sincerely,

Arif

dear Sir

 

the addins does not run


 

t-sne addin not recognizing R install.png

Dear @raewdy, I believe I have responded above regarding this issue. It's likely because JMP R Interface had trouble finding your R installation.

Please try to set R_HOME as an environmental variable as following and try this add-in again. Open Window CMD console and type: setx R_HOME "Path to R". And make sure Rtsne and umap packages are installed to this version of R. Please let me know if this solves your problem.

If you are using Mac, you need to downgrade your R to version 3.3.3 and only use T-SNE.
Also, we have a few threads on the community talking about this issue that you can check out.
https://community.jmp.com/t5/Administration-Discussions/Help-JMP-find-R-installation/td-p/6357
https://community.jmp.com/t5/Discussions/Setting-path-to-R-location/td-p/59764

 

Hope that helps!

Dear sir:

after install tsne, Rtsne & umap packages, and type the below script on CMD, the add-ins are run✌

thank you very much

2019-09-18.png

@raewdy Thank you for letting me know. Very glad to hear it!

Dear @MJ,

When attempting to run the program on the test data set, I got an error saying "Issues found in R, could be caused by unsuccessful installation of Rtsne/umap packages or limited memory."

 

I'm pretty sure I downloaded everything I need (the packages and R, too), so if you could clue me into why this isn't working, that would be much appreciated

 

Regards,

Dhruv Bhattaram

Hi @DBhattaram, it's possible that JMP didn't find the right R version, or the R versioning issues with JMP R Interface. Could you please open the log file (CTRL+Shift+L) when you see the error message and send the detailed log info to me at Meijian.Guan@jmp.com? I'd be happy to take a look.

 

MJ

Are you using a Mac? Faced the same problem! Unfortunately UMAP did not work.
Rtsne nicely worked.

Best wishes
Pat

Hi, @MJ 

 

I sent the log of all my failed attempts at getting it to run the program. Hopefully, that will be of some use

 

Regards,

Dhruv

@Pat1 This is on Windows for me

Thank you @DBhattaram, it looks like you didn't have Rtsne and UMAP package installed to the R version JMP is talking to. If you have multiple versions of R, make sure you set up R_Home as environmental variable as following: open Window CMD console and type: setx R_HOME "your Path to R". And make sure Rtsne and umap packages are installed to this version of R. Let me know if it solves your problem.

Hello, I have the same error saying "Issues found in R, could be caused by unsuccessful installation of Rtsne/umap packages or limited memory." I installed UMAP and TSNE from Rstudio. I use R x64 3.5.2 under Windows

Hello @Lu, sorry for lacking details in the error message. Could you double-check if you installed Rtsne instead of TSNE package in R? Let me know if that fix your problem.

I did install Rtsne as suggested but get the following error message now;

 

 

Capture.PNG

@Lu Your problem was likely caused by multiple installations of R on your machine and JMP couldn't decide which one to use. Please try to set R_HOME as an environmental variable as following and try this add-in again. Open Window CMD console and type: setx R_HOME "This PC\Documents\R\R-3.5.2". And make sure Rtsne and UMAP packages are installed to this version of R. Please let me know if this solves your problem.

I am a Windoxs 7 user and not an ICT expert. So I do not find how to get into the "CMD console", sorry  :-(.  I removed R en reinsatlled it in the same map as JMPpro. Still getting the same error message as above. Not a good idea?

@Lu Sorry for the trouble. On Windows 7, I think you can go to the start menu, type cmd or command in the search box, and the press enter. Or you can just go to Start Menu and go to All Programs. Then, open Accessories and click the Command Prompt shortcut.

Still receiving the following error when executing the Embezdding Add-in.

 

Eror JMP Embedding Add-in.PNG

Add-in still not working in JMP-pro after changing the CMD. In R, I used the Tab "Packages"- "Install Packages" to install umap  and Rtsn package. Any other suggestion?

Any other suggestion

 

So, is t-SNE/UMAP going to be added to a future version of JMP? It is a very powerful abnormal event detection technique with wide applicability across industry. I just read this paper. t-SNE performs very well on the Tennessee Eastman Process dataset:

A new unsupervised data mining method based on the stacked autoencoder for chemical process fault di... 

Hi @markschahl, I know John Sall was working on a JMP version of t-SNE but not sure when we are going to release it. I agree that it's a powerful method and implementing it in JMP would be the optimal way as calling out to R has many potential issues. You can request it as a new feature through our technical or product management teams. 

@MJ & @markschahl , another thought is to add it to the JMP Wishlist on the community:

 

https://community.jmp.com/t5/JMP-Wish-List/idb-p/jmp-wish-list

 

This will help with getting some numbers around the demand and let others voice their support for the capability (I agree it would be great to have in JMP, too!)

 

M

@MikeD_Anderson Thank you Mike!