Solved: Re: Help with comparing two sets of data

jennison_floyd · Jun 8, 2023 5:16 PM

I'm sure my problem is laughably simple, but I really need help! This is very much out of my skill level.

I have two sets of data: 2015 and 2016 data for ER vist counts for different syndromes (asthma and general respiratory) and particulate matter pollution levels. I want to know the relationship of PM2.5 on ER visits, specifically, is there any difference between the 2015 and 2016 data and if so, is it statistically signficant?

txnelson · Feb 28, 2017 01:55 PM

What you need to do is to run a regression with ER as your Y, and PM and Year as your predictors.

To do this,

1. Stack the PM data and the ER data columns. The Stack Platform will allow you to do multiple series stacking.

2. Rename the data column for the PM data to PM, and the ER data column to ER

3. Create a new column. Specify a formula for the new column that strips off the Year value from the column "Label" which was created when stacking the data. Use the formula:

word(-1,:Label)

4. Then go to

Analyze==>Fit Model

5. Specify ER as the Y(Response) column and then place PM and Year in the Model Effects Area

Jim

View solution in original post

txnelson · Feb 27, 2017 8:30 PM

The primary issue is that your data needs to be in a slightly different format. Your data has the data for the different years in 2 separate columns, even though what is being measured is the same thing. To get the data into one column for the measurement, and a second column to specify which year, all that has to be done is to stack the columns. Go to:

Tables==>Stack

Specify in the dialog box that appears, which columns to stack

For clarity, I changed the default name of the new stacked column that will contain the data, from "Data" to "PM2.5", and the column that will indicate which of the stacking column the PM2.5 data came from, as "Year". Below is the result of the stack:

Now the data can be analyzed. The platform to use for this is the Fit Y by X platform.

Analyze==>Fit Y by X

In the dialog box that appears, simply place the PM2.5 column in the Y Response selection box, and the Year column in the X Factor selection box and then click on the OK button

JMP first produces a graph to help in the understanding of the relationship. To request a t-test, which would be the appropriate test for what you are asking, simply click on the red triangle at the top left of the graph and select the t-test option

The t-test results will be added to the current graphical display

The two groups are statistically different with less than a .0001 probability of being in error.

If you look under the red triangle, you will see many other items you may want to add to your display

I suggest you take a look at the JMP documentation provided when you installed JMP:

Help==>Books

I recommend you look at the Discovering JMP book and the Using JMP Book

Jim

jennison_floyd · Feb 27, 2017 11:49 PM

Thank you very much for your response and help, with your examples I was able to stack my other data as well and see similar relationships. So for all of the data, the 2015 and 2015 values are statistically different.

My question now becomes, how can I see whether the relationship between PM2.5 (x factor) and ER (y factor) visits in 2016 is statistically different from the relationship between the two of them (PM2.5 and ER visits) in 2015? Is this a multivariate analysis? I'm just not sure how to approach this.

txnelson · Feb 28, 2017 08:34 AM

In your data, there is a "Date" column. All of the dates are for 2016. What does that represent? Are the 2015 data listed for row 1 for 09/01/2015? And if so, are you thinking that you want to match the data based upon calendar date? Can you please clarify?

Jim

jennison_floyd · Feb 28, 2017 6:26 AM

I am sorry for the confusion, the dates are for September 1st to December 30th for 2016 and 2015. You are correct, the 2015 data listed is for row 1 for 09/01/2015 and so on, matching with the first Date column.

And yes, I was thinking of matching the data based upon calendar date. I am trying to see whether any spikes in er visits at certain dates relate to any spikes in PM2.5 dates at the same dates, and use the 2015 data as a control for an air pollution event that happened in 2016. I am not sure how to accomplish this, though.

txnelson · Feb 28, 2017 10:21 AM

Given you new information, my previous suggestion is probably invalid. I assumed non matched data. If you consider matching on date as a valid match, then you need to run a Matched Pairs t-test

Analyze==>Specialized Modeling==>Matched Pairs

But previous to that, I would investigate the matching on dates to validate your assumption.

Jim

jennison_floyd · Feb 28, 2017 11:52 AM

Both PM2.5 and ER visits are correlated to the date, using fit Y by X. So it is suitable to use it in the matched pairs t-test, with PM2.5 and ER visits as Y and date as X?

txnelson · Feb 28, 2017 12:48 PM

If the data table you included in your first description, I have to disagree with your findings. Neither the PM or ER visits have a significant correlation across years. Thus indicating there is not support for matching

Neither RSquare value is significant at the .05 level

Jim

jennison_floyd · Feb 28, 2017 12:57 PM

Oh, i'm sorry, I used PM2.5 2016 and date 2016 and then PM2.5 2015 and date 2015. I appreciate your patience with me. So this excludes the use of the paired t-test since they are not correlated, correct? so I suppose that returns me back to your original suggestions of stacking the data as non-matched? How can I analyze the relationship between PM2.5 levels to ER visit counts between the years (2015 as a control)? I can run a fit y by x on 2016 PM2.5 levels and 2016 er visits and also for 2015, but how can I compare both of these results to see if the differences are significant?

txnelson · Feb 28, 2017 01:55 PM