Discussions

AlphaParrot8248 · Jun 3, 2022 08:58 AM

I have two stores. each store sends a notification when they make a sale. The notification specifies how many of a specific item were sold.

Some days no notifications, other days several. For example store A send 45 notifications of total 78 items for January 2019, compared to 40 notifications of total 68 items for January 2020.

I receive a monthly report of the notifications. I created a column with cumulative sale of the item along the year.

I am comparing the cumulative sales of the item over the years 2019 and 2020. I find a clear decrease of cumulative sales during 2020 because of covid. Now I would like to compare the two stores regarding the COVID-related decrease in sales in 2020 ( relative to 2019). For example STORE A showed 11% decrease in total sales of the item for 2020, relative to 2019, while store B had only 6% drop.

I would like a statistical test to compare the decrease. the problem is that the number of notifications are different between month and years for the same store, and between stores for the same time period.

How do I test whether covid impacted the two store differently?

dale_lehman · Jun 3, 2022 03:13 PM

My first reaction is that you've already done it. The graphs certainly suggest a difference in how COVID affected sales at the two stores. Conducting a statistical test would add little to those graphs, in my opinion. Of course, the test is not meaningless since you want to know whether the different trajectories are not just random variation. Here I find the graphs a bit strange. I suspect the 2019 lines are fitted relationships - I doubt the data is perfectly linear. So, it is really the background variability of the points around the lines that are a measure of random variation, and whatever test you do will be to compare the different decreases in sales with that variability. I suspect there are a number of ways to do so. One that I'd try is to fit a time series model to the 2019 data (and further back if possible, particularly if there is seasonality in the data), and then see where 2020 falls in relation to a prediction interval. Each store's data should produce a probabilistic measure of how likely the drop in sales would be just due to random variation rather than a COVID effect. You could compare these probabilistic measures for the two stores.

I'm pretty sure other people can suggest "tests" to perform - but my own inclination is that the graphs may be sufficient for your question - once you add the actual data points into it. If the actual data falls directly on the lines in your graphs, then I think the picture does answer your question without any need for a test.

peng_liu · Jun 3, 2022 10:13 PM

If the interest is in comparing the decrease of total annual sales, I agree with @dale_lehman that you have already done it using the graph.

But there are other hypotheses that you may come up with. You have mentioned: multiple notifications, multiple items, and "the number of notifications are different between month and years for the same store, and between stores for the same time period". So, for example, are the decrease of total annual sales of individual items comparable similarly like those from entire store sales, are the individual item sales within/between stores got impacted similarly, or differently? Then you may calculate the quantities at individual item level, and compare those quantities.

What interests me in the plot is that a homogeneous Poisson process (HPP) model might be appropriate. You drew cumulative quantities, but you mentioned "the number of notifications are different between month and years for the same store, and between stores for the same time period".

Regardless how irregular the number of notifications came in, the cumulative quantities look straight for 2019, and piece-wise linear for 2020. That is a hint for me that an HPP model is a good candidate for 2019. And you can compare recurrence rates (a proxy for monthly sales) between the stores.

Meanwhile, more interesting to me are the curves for 2020, though they are not straight for the entire year, they appear piece-wise linear to me, and I mark them up using numbers in the following screenshot. Now look closer, period 1 has a slight uptick, comparing to the same period in 2019, which means a higher recurrence rate, comparing to the same period in 2019. That is a possible hypothesis for testing. Next, period 3 has a slight off (versus completely parallel to) the same period in 2019, which means a lower recurrence rate, comparing to the same period in 2019. Another hypothesis. No doubt, period 2 indicates that Store A almost shut down. Now look at period 4, which may suggest a hypothesis whether Store B started 2020 at the same sales performance as what was in 2019. Period 5 suggests an impact, but not as bad as what happened to Store A. Another hypothesis. Period 6, I see a slight uptick comparing to the same period in 2019 for Store B (later in 2020 Store B was doing better than the same period in 2019), even just very slightly, but that suggests a hypothesis as well. To fit HPP, you need to use Recurrence platform.

AlphaParrot8248 · Jun 9, 2022 07:31 AM

Thank you both. I will learn your answers and get back to you as needed.

Discussions

How to statistically compare a change between two curves with the change between two other curves.

Re: How to statistically compare a change between two curves with the change between two other curves.

Re: How to statistically compare a change between two curves with the change between two other curves.

Re: How to statistically compare a change between two curves with the change between two other curves.

Recommended Articles