Re: Functional Data Explorer Validation

Report Inappropriate Content

Hi,

I am using functional data explorer to generate FPCs using 70 lots, each with about 15 functional data points. I want to use the model generated with these 70 lots to generate the FPCs for a number of new lots. To do this I use the validation option classifying the 70 lots as the training set and the new lots as the validation set.

However, the FPCs generated for the new lots change depending on how many new lots are added - should they not be the same as the model generated using the original 70 lots is not changing?

Thanks.

Victor_G · | Posted in reply to message from Here4DOE 03-12-2025

Hi @Here4DOE,

Welcome in the Community !

To receive more feedback and responses for your questions, I would recommend reading Getting correct answers to correct questions quickly.

Could you provide some context and more info ? Perhaps a screenshot of the situation, details about how you use the platform, how you set up the validation column, or even better, a sample dataset we could use to reproduce the situation ?

I tried to reproduce your situation using the JMP dataset "Fermentation Process" and I did the analysis twice : a first time with all rows training + validation, a second time with all rows training but some validation rows excluded. For the validation rows in common between the two sets, the FPCs are the same :

This is the expected situation, as validation rows for the Functional Data Explorer are like test set, you can calculate FPCs and other informations on these rows based on the training set, but the model fitting is not influenced by these validation rows. So you should expect the same values for validation rows, no matter the number of rows added. See Validation in JMP Modeling.

I think there might be something else that give you these results, maybe a difference in the data splitting (not the same rows for training, are some excluded ?) or data processing/modeling (any pre-processing/scaling/alignment/... ? Same model and parameters used in both situations ? Same number of Shape Functions ? ...).

Hope this answer might help in the meantime,

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Here4DOE · | Posted in reply to message from Victor_G 03-13-2025

When I do as you describe, training and all validation as well as training with one validation lot excluded, the FPCs are the same for the training lots but differ for the validation lot, see screenshot below. All training rows have been included in both instances and my validation column is a numeric column containing 0's for training and 1's for validation lots. The X, Input is time and for each lot, the times that I have data for differ but are all over the same range.

Victor_G · | Posted in reply to message from Here4DOE 03-13-2025

Sorry, but there is no sufficient information to help you and debug the situation.

I see your problem but this is not the expected behavior, there shouldn't be differences depending on the number of validation rows.

Could you provide an anonimized dataset or reproduce the problem on a JMP standard dataset ?

I really believe there might be some data leakage or differences in the modeling somewhere.

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

Ben_BarrIngh · | Posted in reply to message from Here4DOE 03-12-2025

Hi @Here4DOE,

I agree with @Victor_G that we would need to see some of the data or a reproduced data set to find out how this is happening, FPC values should be generated consistently regardless of validation size. How are you accessing the FPC scores? Are you saving function summaries?

I would also suggest you reach out to support@jmp.com to set up a support case to look through this.

Thanks,

Ben

“All models are wrong, but some are useful”

Functional Data Explorer Validation