Hi @maryam_nourmand,

It depends what is your objective.

Saving a simulation formula can help you assess the coefficients distributions of your model, by switching in the diagnosis response with the diagnosis simulation formula : Simulate

The simulation formula is just a condensed version of the prediction formulas you can save from the model fit report: instead of having one probability column for each class and a final classification column, you only save one column with probabilities calculation and classification inside the same formula.

If you're more interested in the robustness of your model regarding variations in your inputs, then using the Simulator from Prediction Profiler with variations in the inputs (and possibly adding noise in the output) and run a Simulation Experiment may help you. You can then generate variations in your inputs, and possibly shift the distributions of your features to see how it affects your model, as well as increasing noise to see how robust your prediction model might be.

On a side note and related to the points brought by @dlehman1 :

My goal in finding the best parametric model is to simulate and generate data with a larger quantity than my initial dataset. I want to use more data for my machine learning model, but I want the simulated data to closely resemble and be similar to my initial dataset.

In order to simulate and generate data that is similar to the real data collected, you have to "mimic" the data generation process. If there are strong non-linearities, correlations, or other patterns found by your ML model and not considered by the parametric model, you should simulate and generate data with the ML model, or else you'll introduce a strong bias in the simulation/generated data. I don't understand the need to have a separate model for data generation and prediction for this use case.

Note that no matter how predictive a model might be, it's only a simplification of a phenomenon you're trying to understand and predict, so the data generation from this model may still be more or less biased (and probably generate less noisy outcomes than real data). But it can still be useful to assess its robustness to variations in the data like you intend to do.

And no sorry, I don't have a dataset that includes multiple stages treatment. I just found this one and use it to provide an illustration example that could fit your use case, that's all.

Hope this answer will help you,

Victor GUILLER

Scientific Expertise Engineer

L'Oréal - Data & Analytics