The intercept is always some variation of where the line (or plane of regression in models with more than a single term) is on the Y axis when the other coefficients contribute nothing (i.e. are set to 0). When we center just the interaction term, "nothing" of the main effects takes on a changed meaning. Not numerically, we're always still talking about 0 of each predictor, but what that zero is pointing at in the population is changing because mean-centering the interaction shifts the zero interaction effect to the average behavior in the population, not the literal origin point (0,0) of the predictors.
In short, in a model like this, the intercept is more like an estimate in an analysis of covariance, an adjusted estimate based on removing, statistically, the average effect of the interaction from the plane. I don't find that explanation particularly helpful conceptually, so if you'll allow it, I'm going going to talk it through with the example I used before.
I always find it helpful to see these things visually. Here's that example I used before, and let's look at the regression planes (which will be the same) for the centered polynomial model (left) and the uncentered model (right). I've added in a response grid at 50 for both (which is the intercept of the centered model). I have also put blue dots to show where the intercepts of the models are

Starting on the right, the intercept has a very easy interpretation. It's the value of Y where the plane of the response crosses 0 for both X1 and X2. That is, when there is 0 of study hours and 0 of previous knowledge. Easy. (Important for later: we aren't even thinking about the interaction term here because in this kind of model, when X1=0, and X2=0, we know that the interaction adds nothing because that b3 coefficient is being multiplied by zeros)
For the centered model on the left, the model intercept of 50 is well above the value when there is 0 of both Xs. But why the bump of roughly 20 exam points?
A score of 50 is where we have roughly 40 of Previous Knowledge and 0 Study Hours; or, where we have 0 Previous Knowledge and 4 Study Hours. Here I've toggled on the value grids so you can see them line up with the blue dots I put before:

So, what gives?! We know these are not the means of Previous Knowledge and Study Hours, so it's not as simple as holding one variable constant and the other at their mean. One thing might pop out to you here: these points are a symmetric distance up the plane of response from the "true" (X1=0, and X2=0) intercept. And the only term in our model that exerts symmetric influence (in a scaled sense) on Y across the factors of X1 and X2 is b3, the interaction term.
What we're not accounting for yet is setting the *interaction* term, B3, to 0. And that zero happens at a different place in a model like this than where X1 and X2 are 0 (because of that centering); it happens at the means of X1 and X2, so we're talking about *average* interaction. The intercept of 50 here reflects a kind of adjusted baseline: it's what we would get at (X1 = 0 or X2 = 0) if there were no interaction effect in the population. Conceptually, an estimate the intercept adjusted for the presence of the interaction.
To me, this term resists a conceptual interpretation quite a bit more than any typical intercept but here's how I would frame it in this case: With the negative coefficient for the interaction term, we know that these factors are interacting antagonistically (more of one decreases the strength of the relationship between the response Y, and the other factor). That is, the more people know ahead of time, the less they get value from studying on average. Or, the more people study, the less on average they get value from how much they knew. The intercept in this model is trying to tell us what exam scores would be like *if that were not the case.* If that interaction weren't the state of the world we measured, then people who studied 0 hours would have had more value from their previous knowledge, and so they would do better on the exam, a bump up from an intercept of 30 to 50. And if that interaction weren't the state of the world we measured, then people who had 0 previous knowledge would have had more value from their studying, hence that same bump up of the intercept from 30 to 50. Like an ANCOVA, this is a statistical "as if" thought experiment.
I hope this helps!
Jules