In your case, p-values are 0.00000202645594483411 for Pearson's chi-square test, and 0.000101189787981198 for Fisher's exact test. Both p-values are small, so I think you can use both results (because traditionally we say only "p<0.001" when p-value is less than 0.001).
"Valid" in statistical testing means that the test can always control type 1 error, i.e. the probability of type I error for the test must be always less than or equal to the nominal alpha level.
Pearson's chi-square test is asymptotic test, and it does not control type I error. This "invalidness" of the chi-square tests become more apparent when some expected counts are smaller. The very rough rule of thumb, "some expected counts are less than 5", is sometimes called "Cochran's rule".
The Fiser's exact test always controls Type I error. i.e. The probability of type 1 error for Fisher's exact test is always LESS THAN or equal to the nominal alpha level. If you are careful about false positive in your situation, Fisher's exact test might be a good choice. If you are careful about false negative, Pearson's chi-square test might be better.
As you said, Fisher's exact test is not actually "exact". It is almost always conservative (at least in Neyman-Pearson's sense). Fisher's exact test is "valid" only in the above sense. All "exact" tests and "exact" confidence intervals for discrete variables (without randomization for breaking discreteness) become conservative. For example, exact tests and exact confidence intervals (a.k.a Clopper-Pearson's confidence interval) for binomial probabilities are also conservative.
Prof. D.R.Cox called this kind of "fix" "technical conditioning". Although people are unconscious for it, the most popular technical conditioning is regression analysis (like linear regression, logistic regression and so on). Even if X variables in regression models are often random variables, we usually "fix" X variables when we calculate confidence intervals and tests. People usually are not interested in the relationship within X variables. We can remove these nuisance parts by "technical conditioning".
If data for two groups is sampled from two infinite independent binomial distributions, the design fixes only row totals (say, n1 and n2). Even so, if two binomial probabilities are equal, we can think this process as the one in which we sample n1 + n2 individuals from one infinite independent binomial distribution at first, and then divide this first sample to n1 and n2. This two-step sampling is exactly same as the one-step sampling from two infinite population. And, when we sample n1+n2 individuals at first, the number of events and non-events in this first sample become fixed.
In your case, data can be seen as a random sample of 108 individuals from one multinominal distribution, and H0 must be "p11/p12 = p21/p22". Assume that in your infinite population, there are infinite red balls and black balls. First, sample 108 balls from this infinite population. This first sample fixes column totals (98 and 10 in your case) as the above binomial case. And then, separate this sample to two groups. This fixes row totals (90 and 18 in your case). This two-step sampling generates the same distribution as the original sampling from one multinomial distribution.
Note that, the degree of freedom for a 2x2 table is one, not two, three or four. This means that the chi-square test also "fix" the column totals and row totals.
Anyway, Fisher’s exact test is always conservative. And this conservativeness becomes more apparent for smaller samples. The chi-square test is asymptotic test. It does not show this “too conservative” results although it sometimes cannot control type I error.
Also note that I guess that you cannot say about “causal effects” of X1 to Y in this case. The above explanation is related only to how to calculate “valid” statistical tests.
Yusuke Ono (Senior Tester at JMP Japan)