I will offer some of my thoughts which might not be of any value and may not be useful regarding the questions posed by DS. My thoughts align with Pete's as I have similar experiences. In grand scheme of trying to understand causality, there are many different approaches. In ALL cases, iteration is required. I will over simplify two of those approaches:
1. A statistical approach: This is an attempt to look for clues in existing historical/observational data (AKA Data Mining). This often entails use of some kind of regression approach (of which there are many), ultimately looking for patterns in the response variables (i.e., Y's) and trying to correlate those with patterns in input variables (i.e., x's). Instead of just a quantitative look (e.g., bootstrapping, simulating, et. al.), use graphical methods to look for patterns. The correlation of patterns should inspire explanations as to why those patterns exist. These explanations are hypotheses. Once hypotheses have been formulated, gather data via directed sampling (e.g., DOE) to build confidence in causal relationships.
2. A scientific/engineering approach (AKA scientific method): This approach starts with hypotheses about the potential effects of factors on response variables. These hypotheses are a function of SME experience, intuition, education, understanding of accepted scientific theory, etc. When there is a large number of variables (e.g., >15), and hypotheses are general, directed sampling can be used to separate and partition the components of variation into smaller subsets. I would strongly recommend this approach for your production processes as it does not "disturb" the process. As the number of potential influential factors is reduced, experimentation can be used to build confidence in causal relationships (as you indicate).
Which approach will be more effective and efficient is always situation dependent. When we are extremely low on the knowledge continuum (e.g., there is a lack of subject matter knowledge), perhaps approach 1 will be advantageous. If hypotheses already exist, approach 2 may be more advantageous.
Regarding the inability to experiment on the production process (without getting into an argument about short-term vs. long term thinking), this can add complexity to the investigation. The issue is inference space. What is wanted is for the pilot line to mirror the production process. This is a challenge because the ilot lie is in a lad and is often a completely different inference space. Often this means noise will need to be added to the pilot line (e.g., vary ambient conditions, use multiple lots of raw materials). When doing studies on a small scale it is important to exaggerate effects. Both the factor effects (e.g., bold level setting) and noise effects. The exaggerated noise effects can be accomodated in experiments using blocking and split-plot strategies.
IMHO, simulating and bootstrapping (or any quantitative method) are completely dependent on how you acquired the data and how representative the data is of future conditions. If the data used for simulating or bootstrapping is not representative, neither method will be useful.
"All models are wrong, some are useful" G.E.P. Box