Hi Mark,
Thanks for your comments.
The data generated is very slow -- meaning it takes a long time to actually get the original data, days & weeks. So, building a data set that is 50K in size is not feasible. I am using some of the Pro (running Pro 14.1.0) modeling functions to try and generate a model that not only predicts well the data, but also captures the structure within the data. Having a larger data set that mimics the original structure would be very helpful in improving the model. Increasing the sample size (including more train, validation, and test size as well) and capturing the structure of the data would help with training a better model -- at least I think.
It's not strictly a time series, although one could view it that way. The data events are not tied to each other in time, necessarily. Another poster suggested a time series analysis, which I will look into.
Unfortunately, when I try to use the simulation feature of the Profiler and generate the larger data sets, the new output data doesn't show any of the related structure of the original data. I can build a new model on it that fits very well, but when I compare the new model to the original model (built on the original data), using the original data, the new model is not as good of a fit.
Sorry I can't be more specific, but I can't discuss the details of the data. I hope my comments make sense to the points you brought up.
Thanks!
From what I'm wanting to do, the simulation feature in Profiler seems the right way to go, but it's not giving me what I'm trying to get. The Platorm Simulation is definitely not the right way to go, or at least I can see how to adjust it to give me what I'm after.