A couple of days ago I tried, somewhat thoughtlessly, an MCA on four variables from the Twitter record. These were all user names, user screen names, in-response-to names, and the like. All are alphabetic of random length. The procedure ran perfecty.
Now I have started to redo the analysis, thinking first. My obvious mistake. on a 4257 case file, I cannot get the MCA to finish executing (I have not waited long enough).The Windows Task Manager tells me the program is not responding, but, of course, core size and CPU usage continually change.
I speculated about the number of categories that might be generated from these variables. So, I picked the user ID and the ID of the in-response-to entity that occasioned the tweet. This is running on a Dell 8 core machine with 64GB of memory. It has been running for two hours, using 8% of the CPU (varying)a couple of percent) and under 5,000 MB memory. The comment line lists "not responding.." I can let the job run all night at no extra cost. But the underling logic bothes me, because I do not know it well enough.
a. If a variable uses numbers, but is designated as nominal modeling--the maximum number of categories is the number of cases (e.g. the number of tweet authors in the file).
b. If a variable uses arbitrary characters, words, or names, the same number holds--the maximum dimension in the direction fo the cases is the number of observations. So no difference.
c. If a variable is modeled as a number, even if ordinal, the number of positions might be the number of places between the lowest and highest number in the set.
d. One of the JMP Community members did a 600,000 case study and produced neat graphs. He must have used a magic wand--or had very few values in the target variable.
e. If I must reckon with creating a who-to-whom matrix, I must be prepared for (in my case) a 4257 x 4256 or thereabouts matrix which is then to be simplified. But at first guess, this matrix would be under 19 million words. That is not much for a computer.
What am I failing to understand? Am I limited to something like my 4257 cases times a variable with only a dozen or two values?
So, since you folks have been so kind in the past, I thought I might toss this out. If someone wants, I can upload a table with all the cases and a few variables, but don't want to flood somebody else's storage.
: