cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
Choose Language Hide Translation Bar
frankderuyck
Level VI

CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

For a classification study I need to calculate the distribution features (distributions T & NON T are different) of PC Scores in attached data set. A JMP file has to be constructed where per participant (1 to 16) and per classification T/NON T) and per sample of 100 rows the following distribution features have to be calculated and listed: mean | standard deviation | variance | skewness | kurtosis | percentiles 10th/25th/50th/75th/90th | entropy 

How to do this?

 

 

 

7 REPLIES 7
jthi
Super User

Re: CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

You can for example use Distribution with By columns

jthi_0-1759930611454.png

Then add Customize Summarize statistics and Quantiles as you need to

jthi_1-1759930656265.png

After that, you can right click on the tables in the platform and create Combined Data Table

jthi_2-1759930694226.png

Then you can combine those two tables with Concatenate (might require some column renames)

-Jarmo
frankderuyck
Level VI

Re: CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

Sorry for delay, thanks I am going to try this asap

Victor_G
Super User

Re: CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

Hi @frankderuyck,

Besides the excellent suggestion proposed by @jthi, you could also use the Tabulate platform that enables you to calculate a lot of statistics from your data table. Some other statistics, like skewness and kurtosis, can be easily calculated with column formula from basic statistics like means and std dev computed with tabulate: https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm

Hope this second option may also be of interest,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)
frankderuyck
Level VI

Re: CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

Thanks Victor & @jthi, for you useful suggestions, however thinkk there is a misunderstanding so I need to clarify more clearly my task. I need to make a classification model for two movement types T and NON T. As shown in graph below the PC Score distributions of T & NON T (cfr. file above in attachment) are different so I need to find a model based on the score distribution characteristics. To do so, for all 16 participants I want to draw samples of 100 consecutive rows from each movement type and report in rows of a new table the distribution properties so I can generate a ML binary classification model T/NON T. 

 

Hope this is sufficiently clear; how to create a table with in rows, for all participants, the T/NON T distribution properties of 100 row samples? 

 

frankderuyck_0-1760353876261.png

 

jthi
Super User

Re: CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

Create a formula which lets you select rows you are interested in (you might have to add your classification column to the formula as grouping column)

Col Rank(:Participant, :Participant) <= 100

create subset/exclude not wanted rows and then calculate the statistics.

-Jarmo
frankderuyck
Level VI

Re: CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

Hi Jarmo, does not work, 100 first rows get a label 1, rows > 100 are labeled 0 ?

jthi
Super User

Re: CALCULATION & LISTING OF DISTRIBUTION FEATURES OF A VARIABLE

you might have to add your classification column to the formula as grouping column

jthi_0-1760424942590.png

Col Rank(column, <ByVar, ...>, <<tie("average"|"arbitrary"|"row"|"minimum")) 

 

There are also plenty of other formulas you could use

-Jarmo

Recommended Articles