Hi @Mark_Bailey , @statman , @P_Bartell ,
Thank you for the thoughts! I have attached my data table so you can replicate what I am seeing, if that is helpful. Right now, the cells without a value in the "Time" column are the failed runs. Without a value, the Fit Definitive Screen script doesn't run. If I exclude those rows/runs, the script still doesn't run. If I assign random values, it will run. I know it is best not to exclude any runs and this will harm the model, but since I added 8 extra runs when generating the table, I was hoping that losing 4 to failures might still work. I was just surprised that the script won't even try! I have started by trying the approach that @P_Bartell suggested and I have questions there too but that should probably be another thread and I want to spend some more time on it first. To answer some of the other questions that came up:
- Yes, I can try to use a different metric than "Time to Completion". The only one I could think of is "Time left in experiment", and try to maximize that value (it would be zero for the failed runs). This is a biochemical reaction that produces a signal that is supposed to increase with time until it reaches a maximum. In the failed runs, no signal was ever produced, so I don't have any other value to use.
- I have tried assigning values to the failed runs, such as numbers larger than the duration of the experiment, or just very large numbers (approximating a Time of infiniti because it never worked), these seem to really skew the model and make all the other data points less impactful.
- To clarify, my replicates truly are repeats of the run conditions, not just multiple measurements of the same run. So in total we ran 31*8 = 248 reactions. These reactions naturally have some variability so replicates are important. The values currently in the table are the mean of the 8 runs. I thought there would be some value in including the data from each run in the model, but if I try to add extra runs to the table, the Fit Definitive Screening script doesn't run.
- To address "What did I learn from the failures": If I look at the failed runs, 3 out of 4 have a combination of Low Mg and High Log2X1. This was expected to be detrimental to the reaction, and I had considered setting up a constraint so that this combination could not be used, but the DSD didn't allow it. If I had done a custom DOE I would have made this constraint. (although, there are cases where Low Mg and High Log2X1 did work, so this is not a guaranteed failure).
For now I will try modeling this data by excluding the failed runs, including the individual replicate data, and not using the Fit Definitive Screening script.