Solved: Re: find values and categorize

Pacco · Feb 1, 2018 12:43 PM

Hello,

I have the problem that I need to categorize some data. The selection of the values to be categorized would be much easier, if there was an automatized process. The problem is such: Table with 3 columns (1 category, 2 depths measurments, 3 number of data points).

1) I would like to get those that are below 0,5 (see example file column 2) for each lake respectively. This means, 0,5 should not be included in the evaluation for selection.

2) In addition, I would like to sort those values found in column 2 after the ones with most data points (N, column 3) by selecting the three deepest values found which contain the most data, respectively for each name of column 1.

3) To categorize them, I would like to create a new column and add a 1/2/3 for the ones with most data found to the ones with least data of those three. If two depths found to be equal in number of data points the deeper one should be categorized as 1 or 2.

See example file for some better understanding :-).

Thanks in advance for any solution, if that is possible with jsl.

XanGregg · Feb 7, 2018 11:23 AM

When I run the script and then make a subset of the selected rows with

dt << Subset( Copy formula( 0 ), Selected Rows( 1 ), Selected columns only( 0 ) )

I get

All the columns are indeed ranked, but within unique values of Stationsnamn and isBelow05. That's controled by the second and third argumenrs to Col Rank(). If you want the stations ranked together, you would remove that by variable from the Col Rank() arguments.

View solution in original post

XanGregg · Feb 5, 2018 09:23 AM

Something like this should help if I understand correctly.

dt = Current Data Table();
dt << New Column( "isbelow05", formula( :Profildjup <= 0.5 ) );
dt << New Column( "rank", formula( Col Rank( -(:n rows + :Profildjup), :isbelow05, :Stationsnamn ) ) );
dt << select where( :isbelow05 & :rank <= 3 );

If by depth "below" you mean numerically greater, then change the "<= 0.5" to ">= 0.5". Thanks for including the example file.

Pacco · Feb 6, 2018 04:44 PM

Hello Greg,

thanks for the reply and the script. Unfortunately, it does not work with my data. The results are two new columns with no data in the fist column and only one entry in the first cell of column 2 ("4,20533670002011e-315").

Any idea what could have gone wrong?

txnelson · Feb 7, 2018 6:04 AM

The script that Xan included works great with the sample data you attached. What is different between your real data and the Example data table you attached?

Jim

Pacco · Feb 7, 2018 04:40 AM

mhh, that´s strange. I just downloaded the sample file from here again (the difference was that I had made a selection with one additional column), however, even the sample file I uploaded shows me the same error I posted before. However, now it marked surface values with a one in the below 0,5 column. (see file attached)

I could simplify the question as well. Then the values just need to be selected and categorized according to their rank in highest numbers for each lake name section.

txnelson · Feb 7, 2018 05:10 AM

I have not sure what is causing your issue. When I open the data table you supplied in your last response, I see the you issue. I was able to correct the issue by Right Clicking on the column header for Rank, and selecting Column Info. It brings up the Column Info window. Select "Suppress Eval" as shown below, and then Click on the "Apply" button. Unselect the "Suppress Eval" and then Click on the "Apply" button again, and see if it corrected the values in the column

The other thing I would like you to try, is to delete the "isbelow05" and "rank" columns, and run the following script.

dt = Current Data Table();
dt << New Column( "isbelow05", formula( :Profildjup <= 0.5 ) );
wait(.5);
dt << New Column( "rank", formula( Col Rank( -(:n rows + :Profildjup), :isbelow05, :Stationsnamn ) ) );
dt << select where( :isbelow05 & :rank <= 3 );

If it works, what the issue may be is that your "isbelow05" formula isn't being completed before the "rank" column is being generated, and it is causing issues. The

wait(.5);

is giving it time to complete the creation of column "isbelow05"

Jim

Pacco · Feb 7, 2018 6:04 AM

Thanks for the ideas, the "suppress eval" selecteing and deselcting helped. However, the wait 5 did not change anything!

Pacco · Feb 7, 2018 08:19 AM

May I ask how the result looks like? Mine ranks the entire column. However, I am looking for ranking each section (each batch of same name, see column Stationsnamn).

XanGregg · Feb 7, 2018 11:23 AM

When I run the script and then make a subset of the selected rows with

dt << Subset( Copy formula( 0 ), Selected Rows( 1 ), Selected columns only( 0 ) )

I get

All the columns are indeed ranked, but within unique values of Stationsnamn and isBelow05. That's controled by the second and third argumenrs to Col Rank(). If you want the stations ranked together, you would remove that by variable from the Col Rank() arguments.

julian · Feb 8, 2018 10:22 AM

Hi @Pacco,

Are you using JMP 12 (or earlier) by chance? Col Rank() changed in JMP 13 to allow the additional arguments Xan is using.

From the docs:

JMP 12:

Col Rank(column)

JMP 13:

Col Rank(column, <ByVar, ...>, <<tie("average"|"arbitrary"|"row"|"minimum")