Memory Demands of Multiple Correspondence Analysis

LNitz — Sat, 10 Jun 2023 23:48:34 GMT

A couple of days ago I tried, somewhat thoughtlessly, an MCA on four variables from the Twitter record. These were all user names, user screen names, in-response-to names, and the like. All are alphabetic of random length. The procedure ran perfecty.

Now I have started to redo the analysis, thinking first. My obvious mistake. on a 4257 case file, I cannot get the MCA to finish executing (I have not waited long enough).The Windows Task Manager tells me the program is not responding, but, of course, core size and CPU usage continually change.

I speculated about the number of categories that might be generated from these variables. So, I picked the user ID and the ID of the in-response-to entity that occasioned the tweet. This is running on a Dell 8 core machine with 64GB of memory. It has been running for two hours, using 8% of the CPU (varying)a couple of percent) and under 5,000 MB memory. The comment line lists "not responding.." I can let the job run all night at no extra cost. But the underling logic bothes me, because I do not know it well enough.

a. If a variable uses numbers, but is designated as nominal modeling--the maximum number of categories is the number of cases (e.g. the number of tweet authors in the file).

b. If a variable uses arbitrary characters, words, or names, the same number holds--the maximum dimension in the direction fo the cases is the number of observations. So no difference.

c. If a variable is modeled as a number, even if ordinal, the number of positions might be the number of places between the lowest and highest number in the set.

d. One of the JMP Community members did a 600,000 case study and produced neat graphs. He must have used a magic wand--or had very few values in the target variable.

e. If I must reckon with creating a who-to-whom matrix, I must be prepared for (in my case) a 4257 x 4256 or thereabouts matrix which is then to be simplified. But at first guess, this matrix would be under 19 million words. That is not much for a computer.

What am I failing to understand? Am I limited to something like my 4257 cases times a variable with only a dozen or two values?

So, since you folks have been so kind in the past, I thought I might toss this out. If someone wants, I can upload a table with all the cases and a few variables, but don't want to flood somebody else's storage.
:

Re: Memory Demands of Multiple Correspondence Analysis

Phil_Kay — Fri, 20 May 2022 07:52:28 GMT

I think it might help if you can share some illustrative data.

I don't really understand what the objective of your analysis is and what the data that you are trying to analyse looks like. I am not sure if MCA is appropriate.

If I understand, you have a table with 4257 rows. That is not a problem for MCA in JMP Pro.

A more important factor will be the number of levels within each variable.

Re: Memory Demands of Multiple Correspondence Analysis

LNitz — Fri, 20 May 2022 08:08:17 GMT

Thanks, Phil.

The issue is the number of categories. It is not that the program cannot compute with a thousand or so categories, but it takes a really long time. The question I am asking is who responds to whom. I will play with this a bit more to see if I can screen one of the variables to reduce the number of categories. If anything comes out of it, I will post results.

Larry

topic Re: Memory Demands of Multiple Correspondence Analysis in Discussions

Memory Demands of Multiple Correspondence Analysis

Re: Memory Demands of Multiple Correspondence Analysis

Re: Memory Demands of Multiple Correspondence Analysis