cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
LNitz
Level IV

Memory Demands of Multiple Correspondence Analysis

A couple of days ago I tried, somewhat thoughtlessly, an MCA on four variables from the Twitter record. These were all user names, user screen names, in-response-to names, and the like. All are alphabetic of random length.  The procedure ran perfecty.

Now I have started to redo the analysis, thinking first. My obvious mistake.  on a 4257 case file, I cannot get the MCA to finish executing (I have not waited long enough).The  Windows Task Manager tells me the program is not responding, but, of course, core size and CPU usage continually change.

I speculated about the number of categories that might be generated from these variables. So, I picked the user ID and the ID of the in-response-to entity that occasioned the tweet.  This is running on a Dell  8 core machine with 64GB of memory.  It has been running for two hours, using 8% of the CPU (varying)a couple of percent) and under 5,000  MB memory.  The comment line lists "not responding.."  I can let the job run all night at no extra cost.  But the underling logic bothes me, because I do not know it well enough.

a. If a variable uses numbers, but is designated as nominal modeling--the maximum number of categories is the number of cases (e.g. the number of tweet authors in the file).

b. If a variable uses arbitrary characters, words, or names, the same number holds--the maximum dimension in the direction fo the cases is the number of observations.  So no difference.

c. If a variable is modeled as a number, even if ordinal, the number of positions might be the number of places between the lowest and highest number in the set.

d. One of the JMP Community members did a 600,000 case study and produced neat graphs.  He  must have used a magic wand--or had very few values in the target variable.

e. If I must reckon with creating a who-to-whom matrix, I must be prepared for (in my case) a 4257 x 4256 or thereabouts matrix which is then to be simplified.  But at first guess, this matrix would be under 19 million words.  That is not much for a computer.

 

What am I failing to understand?  Am I limited to something like my 4257 cases  times a variable with only a dozen or two values?

 

So, since you folks have been so kind in the past, I thought I might toss this out.  If someone wants, I can upload a table with all the cases and a few variables, but don't want to flood somebody else's storage.

1 ACCEPTED SOLUTION

Accepted Solutions
Phil_Kay
Staff

Re: Memory Demands of Multiple Correspondence Analysis

I think it might help if you can share some illustrative data.

I don't really understand what the objective of your analysis is and what the data that you are trying to analyse looks like. I am not sure if MCA is appropriate.

If I understand, you have a table with 4257 rows. That is not a problem for MCA in JMP Pro.

A more important factor will be the number of levels within each variable.

View solution in original post

2 REPLIES 2
Phil_Kay
Staff

Re: Memory Demands of Multiple Correspondence Analysis

I think it might help if you can share some illustrative data.

I don't really understand what the objective of your analysis is and what the data that you are trying to analyse looks like. I am not sure if MCA is appropriate.

If I understand, you have a table with 4257 rows. That is not a problem for MCA in JMP Pro.

A more important factor will be the number of levels within each variable.

LNitz
Level IV

Re: Memory Demands of Multiple Correspondence Analysis

Thanks, Phil.

 

The issue is the number of categories.  It is not that the program cannot compute with a thousand or so categories, but it takes a really long time.  The question I am asking is who responds to whom.  I will play with this a bit more to see if I can screen one of the variables to reduce the number of categories.  If anything comes out of it, I will post results.

 

Larry