Parallel Plot - Reducing complexity by combining lines?
Apr 3, 2020 12:12 PM(509 views)
I have been using the parallel plot as an exploratory tool in a data set with ~3,500 individuals which completed the same question over time (~12 scores each) I often use color to distinguish some trait I wish to compare (e.g. Gender, other demographic traits) or splitting the data by some trait on the X or Y. Unfortunately the scores (z-scores) are under 2. While extremely helpful, it would be nice to reduce the complexity by somehow combining similar/equivalent lines. Something that might reduce the number of lines by creating a weighted composite line (e.g. line thickness) for 'matching lines.'
Any ideas would be deeply appreciated! I am a fairly new to JMP. Last, I am not a statistician or DBA, some please keep that in mind if you elect to respond.
You could do some data table summarizations If you classified the data in the columns, into reasonable bins, you could then summarize the data based upon all of the binning columns, and then all rows that fell into the identical set of bins for all of the target columns, would be summarized into a single row, which you would then plot with your parallel plot. Or I could see writing a script that would add a button on the parallel plot, that if you selected in the plot 2 or more lines, and then pushed the button, it would remove those line from the plot, and add back in a summarized line that represents the average of those lines.
You could try using cluster analysis on the data. Treat the 12 responses as variables, one row per subject, clusters on those responses Then see if there are differences in demographics across the clusters. That may give you a clue on what deeper structure there might be in the data.