Hi @frankderuyck and @NominalGemsbok3 ,
There is no ultimate best model, for multiple reasons: choice of performance metric, threshold (for p-value for example), estimation method, etc ... And there are not enough unique treatments (degree of freedom) in a DSD to estimate all effects, so you can easily end up with different but competing models with good performances. You could see a DSD as a supersaturated design type for response surface model.
As @frankderuyck mentioned, due to the presence of partial aliases/correlations between interaction effects, and also between quadratic effects due to the design structure, you can't be 100% sure about the "real" impact of interaction effects and quadratic effects on your target that are detected (by any modeling methods), unless you add runs to better inform your model. You can have however more confidence about main effects, as the design structure avoids having any correlation between main effects and between main effects and higher order effects, so you can estimate them without any bias.
"All models are wrong, but some are useful"
I tried to create a specific visualization called raster plot (see Raster plots or other visualization tools to help model evaluation and selection for DoEs to see how it has been created) on this example to show this multiplicity of models due to the combinatorial explosion of possible terms included in the model (besides the intercept, there are 27 possible effect terms: 6 main effects, 15 two-factor interactions and 6 quadratic effects to choose from), using the platform Stepwise and the option "All Possible Models", (up to 10 terms in the model with strong heredity assumption). Here is the result of the models, sorted by Rsquare value, which shows which terms (in columns) are included for each model (each line):

I prefer using an information criterion for comparing multiple models, such as AICc (the lower the better), as it penalizes the use of too many terms and allow a better comparison for models with different complexities:

As you can see, most of the models do agree on the presence of the main effects of the first 5 factors. For factor 6, the results are different and there is no obvious pattern of presence of this main effect. For interactions and quadratic effects, it's also hard to see some strong patterns, except that some higher order effects don't seem to be included most of the time: interactions factor 1 x factor 6, factor 2 x factor 4, factor 3 x factor 5, factor 3 x factor 6, factor 4 x factor 6 and factor 5 x factor 6. For quadratic effect, factor 6 x factor 6 is absent most of the time in models. If we zoom in a little on the best models according to Rsquare value, there are some interesting observations on higher order effects :

Interactions Factor 1 x Factor 3, Factor 4 x Factor 5 tends to be often chosen in models. Moreover, quadratic effects for factor 2 and factor 4 are also often selected. These results tend to agree with the results I obtained from the Fit Definitive Screening platform, with the same main effects and higher order effects detected:

When limiting the comparison to three different estimation methods, you can also see this situation of different and equivalent models and terms combination. For example with Fit Definitive Screening, GenReg Normal Pruned Forward and GenReg Two Stage Forward estimation methods, we can compare both the performances of the models and the terms included:
- Performances: here with Rsquare and Rsquare adjusted for explainative purposes (how much the model explains the variability in the response):

We can see that the first two methods show similar performances.
- Terms in the models:

Even if the two first estimation methods provide models with similar methods, the terms included for higher order effect are different. They only agree on the inclusion of interaction effect Factor 1 x Factor 3.
So a reasonable follow-up would be to discuss with domain experts about which model(s) are the most sensible/reasonable, and use the platform Augment Designs to confirm and/or precise the most relevant model. You can for example augment it and specify the model for which you want to estimate the terms.
Please find the table with all scripts used in my response.
Hope this answer will help you,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)