Solved: Data restructure

sanqub · Sep 29, 2016 01:19 PM

I need slightly different restructuring.

Suppose in a column there are 100 numbers. I would like to create a new column, which contains frequency of these numbers {similar like histogram bins in Excel}.

Further when select any bin it should select all the rows. In other words the rows and the bins should be linked.

Is there any feature which could help me create this.

Thanks for your help.

Dan_Obermiller · Sep 29, 2016 02:43 PM

I can think of a few ways to give you something like this, depending on exactly what you need.

Approach number one:

Use the modeling type of Ordinal for your numeric column (if they are truly continuous, you may have 100 rows, each with frequency one -- which I assume is not what you want).

Make a distribution of that numeric column.

The Frequencies report will give you a list of each level, along with a count for each level. Selecting a row of that report will select the corresponding rows in the table.

If you must have the frequencies in a data table, you can use Tables > Summary. Specify your numeric column as the Group variable and that is it. You will get a new table that has the numeric values along with the frequencies of each number. Selecting a row in the new table will actually highlight the rows in the original table, too.

Finally, if your numeric column is continuous and you wanted to make a histogram and save the frequency of each histogram bar back to the data table (rather than counting the frequency of each value) that is more involved. First, make your histogram using Distribution. From the red popup choose to Save Level Numbers. Use Table > Summary and choose the new Level column as the Group column. Now join that new table with the original table, matching by the Level column. Keep in mind that this assumes you are using all of the standard bins that JMP determines. The approach will work if you create a custom bin width, but re-creating the new histogram using your custom bin widths will not be automatic.

Dan Obermiller

View solution in original post

David_Burnham · Sep 29, 2016 02:37 PM

You can use the tabulate platform (Analyze>tabulate). Once you have launched the platform you will see a list of your columns. Your columns is numeric but for what you want to do you need to treat it as discrete. Right-click on the modelling icon and change it from continuous to either ordinal or nominal. Now drag onto the drop-zone for rows. You will have the frequency counts for each unique number. The counts are automatically linked to the data table - click on a count value and the associated rows are selected.

-Dave

sanqub · Sep 30, 2016 05:49 AM

Thanks a lot David.

It is indeed very useful feature. Just a quick query, Is it possible to store column created using tabulate option, in same worksheet or even new worksheet, but the counts should be linked to data table.

I used option "make into data table" but they are not linked to original data.

Thanks for your help.

Dan_Obermiller · Sep 29, 2016 02:43 PM

I can think of a few ways to give you something like this, depending on exactly what you need.

Approach number one:

Use the modeling type of Ordinal for your numeric column (if they are truly continuous, you may have 100 rows, each with frequency one -- which I assume is not what you want).

Make a distribution of that numeric column.

The Frequencies report will give you a list of each level, along with a count for each level. Selecting a row of that report will select the corresponding rows in the table.

If you must have the frequencies in a data table, you can use Tables > Summary. Specify your numeric column as the Group variable and that is it. You will get a new table that has the numeric values along with the frequencies of each number. Selecting a row in the new table will actually highlight the rows in the original table, too.

Finally, if your numeric column is continuous and you wanted to make a histogram and save the frequency of each histogram bar back to the data table (rather than counting the frequency of each value) that is more involved. First, make your histogram using Distribution. From the red popup choose to Save Level Numbers. Use Table > Summary and choose the new Level column as the Group column. Now join that new table with the original table, matching by the Level column. Keep in mind that this assumes you are using all of the standard bins that JMP determines. The approach will work if you create a custom bin width, but re-creating the new histogram using your custom bin widths will not be automatic.

Dan Obermiller

David_Burnham · Sep 29, 2016 02:49 PM

Yes, if the question more about saving binned summaries then things can get a bit fiddly I would use the option Save>Level Midpoints from the distribution platform. This will give you the bin midpoint value for each row of the table

-Dave

sanqub · Sep 30, 2016 07:24 AM

Apologies I just checked your first comment. Yes I wanted them to be saved and linked.

But your advice to save as mid point and DanO's third approach is more close to what I need.

Thanks I appreciate your help.

sanqub · Sep 30, 2016 07:22 AM

Thanks DanO for illustrative and all different possible options.

I think 3rd option is more useful in my case. But I have two questions. Once I have joined the level column and original column there is loss of data. I am loosing 40% of my data. [When I used level midpoints as suggested by David Burnham there is 25% reduction]. By loss I mean, Sum of "N rows" are reduced.

It would have been perfect if this newly joined column would somehow be linked to original data.

Thanks I really appreciate your help.

Data restructure

Re: Data restructure

Re: Data restructure

Re: Data restructure

Re: Data restructure

Re: Data restructure

Re: Data restructure

Re: Data restructure