Solved: Re: Singular value decomposition outside of JMP

caseylott · Nov 8, 2019 09:42 AM

Hello- I’m a long-time JMP user (3.2!) and an abysmally untalented beginner in R. Text analysis is new to me, as an ecologist, But I’ve really been enjoying playing with the Text Explorer platform. I have produced a sparse document term matrix that I would like to subject to singular value decomposition in order to do additional analyses in JMP on a dense matrix of singular values for ~100 vectors. I have run into a bit of a wall since I do not have JMP Pro (can’t afford it). I think this means I need to read my sparse document term matrix into R, run some kind of svd function, and then read my new dense matrix back into JMP. This seems like it might be an easy task for someone that is more familiar with text analysis and proficient in R. I’ve struggled to figure out which R package to use to do this (it seems like there are many). When I’ve looked at documentation for different packages, I’ve struggled to figure out which functions and arguments I should use to accomplish this task. I know, it’s time for me to commit to learning R. JMP has shielded me from this inevitability for many years. If there is anyone out there who could share some R advice, or better yet, code, that would read in a sparse matrix as a .csv, create a dense singular value matrix in R, and then write this matrix as a new .csv, I would be very grateful. Thank you!

G_M · Nov 9, 2019 01:42 PM

@caseylott a good way to get started using R and JMP is to replicate in R results that you are familiar with in JMP.

Like you, I prefer JMP to R for so many reasons, in fact I was using R back in 2009 when I was presenting results to an executive team and

as always happens, the executive had additional questions, (which I failed to anticipate, I have gotten better at this) and I couldn't just

produce the new analysis in R, live. I asked for 25 mins to go and conduct the analysis and bring results...I lost their attention. That night

I went home and reflected how I could avoid this in the future..1) better anticipate further analysis and 2) identify a more agile software.

JMP re-entered my life (a colleague introduced me to JMP a few years earlier) and has been my chosen software since.

That said, here is some JMP to R syntax that will enable you to get started.

This may be too basic, but I hope is helpful. I use R to estimate non-linear mixed effects models, polytomous IRT models, and recently started estimating

Bayesian models using both R and JAGS all through JMP.

Names Default To Here( 1 );

//Open Big Class in JMP

df = Open( "$SAMPLE_DATA/Big Class.jmp" );

//execute bivariate regression of weight on height in JMP

df << Bivariate( Y( :weight ), X( :height ), Fit Line( {Line Color( {213, 72, 87} )} ) );
//make sure that you have embedded log open in your JSL Scripting window.

//Initiate R
R Init();

//send data frame to R
R Send( df );

//submit R Code for bivariate regression of weight on height
R Submit(
"\[　
fit1 <- lm(weight~height, data = df)　
summary(fit1)　
]\"
);//close R Submit
//Look into the log and see the R Results

View solution in original post

Mark_Bailey · Nov 8, 2019 09:59 AM

You are welcome to use R with JMP but I don't think that way is necessary. You can save the weighted document-term matrix from the Text Explorer and then analyze the new data columns with any platform in JMP, including those in the Multivariate group. This includes principal components analysis, latent class analysis, and latent semantic analysis without JMP Pro.

Note that the versions in TE are optimized for sparse matrices like the DTM, but you should at least try to use the stand-alone platforms.

caseylott · Nov 8, 2019 11:37 AM

OK, thanks for the suggestion. I've tried a couple of things so far that haven't worked. Perhaps I'm doing something wrong. I've saved the weighted document term matrix and then entered it in as the y variable for: 1) the principle components platform and 2) the latent class analysis platform. I don't see any options for latent semantic analysis in my non-Pro version of JMP. Where would these be located?
When I try to run the weighted document term matrix in the principle components platform, JMP just spins and inevitably crashes.
When I try to put the document term matrix into the Y role in the latent class analysis platform, I get the message that I am using an unsupported data type.
Any suggestions?

Mark_Bailey · Nov 8, 2019 01:58 PM

First, I misspoke. It is difficult for me to remember which features are in each version of JMP and all the differences between JMP and JMP Pro. You are correct: only PCA and latent class analysis are available in JMP.

Second, I was afraid that the large and sparse DTM would be too much for JMP outside of the Text Explorer platform. You might try trimming the term list. In general, curating the term and phrase list can lead to much more meaningful analyses than relying on the default parameters. This is one way to reduce the size of the DTM. You can also reduce the maximum number of terms or increase the minimum term frequency when saving the DTM.

Last, are you using 64-bit JMP?

caseylott · Nov 8, 2019 03:32 PM

Thanks, Mark. I haven't tested how far I need to go with shrinking down my document term matrix to be able to use the latent semantic class analysis in the JMP (not JMP Pro) platform. Perhaps I could make this work, and I'll try, but what about when I have an even larger document set? I predict that I'll run into this problem again.

I think that my original assessment may still be correct. It seems like I still may need a way to transform my sparse document term matrix (created in JMP) into a dense matrix (via singular value decomposition in R) so that I can read the dense matrix back into JMP for additional multivariate analyses.

In other words, I think that my inability to create a dense matrix of singular values in JMP, without JMP Pro, is truly limiting my ability to go very far with text analysis in JMP.

If anyone has any other suggestions for how to deal with this in JMP, or perhaps some R-based solutions, I'd love to hear them.

Mark_Bailey · Nov 9, 2019 07:06 AM

The new SVD routine in the Text Explorer is a real game-changer and just what further text analysis needs.

That is all that I have to offer. I hope that others have some useful suggestions besides reducing the size of the DMT.

caseylott · Nov 9, 2019 12:30 PM

O.K., If I understand @Mark_Bailey correctly, SVD is a "game-changer"; however, you need JMP Pro to play this game with JMP. Is this correct?

Thiking about this a little further, it seems like JMP (non-pro version) does not allow any of the text analysis features that I've seen in videos or read about in blogs: latent class analysis, latent semantic analysis, topic analysis, discriminant analysis, document clustering, or term clustering. As a long-time JMP user who can afford JMP but not JMP Pro, I'm still trying to figure out my more efficient next steps.

For example, if I were able to generate a singular value matrix (e.g., outside of JMP in R) and bring this matrix back into JMP, it seems like these analyses would still be unavailable to me, at least not directly, since I do not even have menus items (via the Text Explorer hotspot) for any of these analysis platforms. I just don't know enough about the statistical details to understand if trying to replicate these types of text analysis in JMP using standard principal components or clustring platforms is a good idea. For example, I just saw a post by @LauraCS where she said "latent semantic analysis (LSA) is simply PCA done on a correlation matrix of text terms (but without any rotations involved)". I'm wondering just how literal this statement is. Does this mean, that if I could bring a singular value matrix into JMP that I could use JMP's PCA tools to do analyses that are essentially identical to both Latent Semantic Analysis and Latent Class Analysis? It seems kind of risky for me to do this without a better understanding of the statistical details.

So... after obsessing about this topic for a few days, it seems like my best option might be to figure out how to go back and forth between JMP (for curation of a document term matrix and visualization) and R (for analyses) like I've seen in the @herush and @andrewtkarl video called "Text Mining in JMP and R". Seems like it might be a good time to have lunch with one of my R friends and pitch a collaboration.

I have no idea how many JMP users are in my spot (e.g., unlikely to move up to JMP Pro). Maybe I'm just pushing on stuff that the typical JMP user doesn't get at. However, if there are others like me, it would be awesome if I could find my way to additional "help" or "training" resources (e.g., blogs, tutorials, third-party videos) that illustrate synergies between JMP and R and ways to deal with situations like the one I am currently in, where 2 tools (JMP and R) are going to be necessary to get the job done.

Like I've said before. I absolutely love JMP and have used it all of my career. This won't change, but is seems like it's time for me to knuckle down and learn R. I've been told it will be worth my time, but dang, I really prefer the interactive, menu-driven workflow of JMP.

G_M · Nov 9, 2019 01:42 PM