Disclaimer: I have not looked at the data, models, or any analysis that one would create for your problem.
I'll focus most of my commentary on your first question...with many questions and some thoughts.
1. Can you share more about the practical problem at hand? I believe any analysis and conclusions need to be considered and filtered through that lens vs. just looking at plots, data, and statistics.
2. Blind numeric comparison of parameter estimates is one, and only one way to compare these two systems. I can all but guarantee they will differ. Heck, rerun the empirical experiment again, and I'll all but guarantee a numeric difference. But are these differences important from a practical point of view? See question #1. above.
3. How much overlap within the factor space exists between the simulation data and the empirical data? Are there areas of interest that are not consistent?
4. How much overlap is there within the response space from each? Are there areas of interest that are not consistent?
5. What is the degree of consistency across the residual space for each model? After all...this is an estimate of unexplained variation for each model...is the degree of inconsistency problematic?
6. Have you tried simulation for each model to see what the sensitivities are over the modeling space?
I invite others to comment and add their thoughts. But at the end of the day I think just looking at parameter estimates is a very, very narrow view of 'what's going on in the system?'