What is the role of "Freq" in the Partition decision tree model?

Created:
Aug 5, 2020 5:33 AM
| Last Modified: Aug 5, 2020 5:48 AM
Please consult the experts: Are there any examples of how "Freq" plays a role in the decision tree model of "Partition"? I searched for the script and couldn't find it. Thanks!

Create a simple example to see. I created a small four row data table:

The actual values don't matter for what we are doing here.

Now go to the Partition Model platform. Specify the Y column as the Y, X as the X. Ignore the Freq column for now. This is what you see once the platform launches:

Notice the count that I circled in red. Our table had four rows so count is 4.

Now relaunch the Partition Platform. Hit Recall, and then specify the Freq column as the Freq role, and click OK.

This is what you see when the platform launches:

Notice now that the count is 40. Why? Because the Freq column had 10 for each row. This essentially tells JMP that although there is only one row in the data table, that observation actually occurs 10 times. Therefore, there are 40 observations in those 4 data table rows.

Dan Obermiller

Re: What is the role of "Freq" in the Partition decision tree model?

If you clicked on the "**Help**" button, it describes each of the input fields. But basically, the values in a column that is placed into a Freq selection box will be used to replicate the values for the row it is processing. So if the value of the Freq column was a 2, the analysis variables would be applied twice to the analysis.

Jim

Re: What is the role of "Freq" in the Partition decision tree model?

Thank Jim!

"Freq

A column whose numeric values assign a frequency to each row in the analysis."

Without concrete examples, I still can't understand the function of parameters.

Create a simple example to see. I created a small four row data table:

The actual values don't matter for what we are doing here.

Now go to the Partition Model platform. Specify the Y column as the Y, X as the X. Ignore the Freq column for now. This is what you see once the platform launches:

Notice the count that I circled in red. Our table had four rows so count is 4.

Now relaunch the Partition Platform. Hit Recall, and then specify the Freq column as the Freq role, and click OK.

This is what you see when the platform launches:

Notice now that the count is 40. Why? Because the Freq column had 10 for each row. This essentially tells JMP that although there is only one row in the data table, that observation actually occurs 10 times. Therefore, there are 40 observations in those 4 data table rows.

Dan Obermiller

Re: What is the role of "Freq" in the Partition decision tree model?

For example, my raw data is the data generated each day, with a "date" column.But in the decision tree model, the "date" is not X factor.

Re: What is the role of "Freq" in the Partition decision tree model?

I don't think you would want to use your Date column as a frequency variable.

Dan Obermiller

Re: What is the role of "Freq" in the Partition decision tree model?

Since the actual value of date is the number of seconds since Jan, 1, 1904, using date as the Freq variable would be duplicating each rows input into the model by thousands of times. If your date value was today's date, 08/05/2020, it duplicating the input of that row by 3,679,463,839 times(i.e. 3.6 Billion)

Jim

Re: What is the role of "Freq" in the Partition decision tree model?

Thanks!

I get the idea of how this freq works.

The date in my data is an 8-digit number that has been converted to a normal data format.

If it is used as the frequency, it should be equivalent to strengthening the weight of the latest date data.

Re: What is the role of "Freq" in the Partition decision tree model?

Although you are correct that the Date column as a frequency will weight the latest dates heaviest, I would not recommend this approach. For starters, you will be telling JMP that you have MUCH more data than you really do. This will affect any inference-based statistics that are calculated as well as model statistics such as RSquare, AICc, etc. Plus, because you are using the number of seconds in a day, you are weighting today by 86400 more observations than yesterday. Is that the right amount of weighting?

Rather than using frequency, as you said in your comment (and I did here), I think you want to create a column for WEIGHTING individual observations. Develop a weighting scheme that you feel is appropriate, create the column that contains those weights and specify that column in the dialog box. This has the advantage of stating that certain observations are more important without modifying your sample size. This will keep the validity of the inference statistics.

Dan Obermiller

Re: What is the role of "Freq" in the Partition decision tree model?

ThankDan Obermiller!

I still need to strengthen study in this aspect.

I still need to strengthen study in this aspect.