turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Help with comparing two sets of data

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 27, 2017 7:42 PM
(4754 views)

I'm sure my problem is laughably simple, but I really need help! This is very much out of my skill level.

I have two sets of data: 2015 and 2016 data for ER vist counts for different syndromes (asthma and general respiratory) and particulate matter pollution levels. I want to know the relationship of PM2.5 on ER visits, specifically, is there any difference between the 2015 and 2016 data and if so, is it statistically signficant?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 10:55 AM
(9386 views)

Solution

What you need to do is to run a regression with ER as your Y, and PM and Year as your predictors.

To do this,

1. Stack the PM data and the ER data columns. The Stack Platform will allow you to do multiple series stacking.

2. Rename the data column for the PM data to PM, and the ER data column to ER

3. Create a new column. Specify a formula for the new column that strips off the Year value from the column "Label" which was created when stacking the data. Use the formula:

word(-1,:Label)

4. Then go to

Analyze==>Fit Model

5. Specify ER as the Y(Response) column and then place PM and Year in the Model Effects Area

Jim

10 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 27, 2017 8:25 PM
(4745 views)

The primary issue is that your data needs to be in a slightly different format. Your data has the data for the different years in 2 separate columns, even though what is being measured is the same thing. To get the data into one column for the measurement, and a second column to specify which year, all that has to be done is to stack the columns. Go to:

Tables==>Stack

Specify in the dialog box that appears, which columns to stack

For clarity, I changed the default name of the new stacked column that will contain the data, from "Data" to "PM2.5", and the column that will indicate which of the stacking column the PM2.5 data came from, as "Year". Below is the result of the stack:

Now the data can be analyzed. The platform to use for this is the Fit Y by X platform.

Analyze==>Fit Y by X

In the dialog box that appears, simply place the PM2.5 column in the Y Response selection box, and the Year column in the X Factor selection box and then click on the OK button

JMP first produces a graph to help in the understanding of the relationship. To request a t-test, which would be the appropriate test for what you are asking, simply click on the red triangle at the top left of the graph and select the t-test option

The t-test results will be added to the current graphical display

The two groups are statistically different with less than a .0001 probability of being in error.

If you look under the red triangle, you will see many other items you may want to add to your display

I suggest you take a look at the JMP documentation provided when you installed JMP:

Help==>Books

I recommend you look at the Discovering JMP book and the Using JMP Book

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 27, 2017 8:49 PM
(4741 views)

My question now becomes, how can I see whether the relationship between PM2.5 (x factor) and ER (y factor) visits in 2016 is statistically different from the relationship between the two of them (PM2.5 and ER visits) in 2015? Is this a multivariate analysis? I'm just not sure how to approach this.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 5:34 AM
(4729 views)

In your data, there is a "Date" column. All of the dates are for 2016. What does that represent? Are the 2015 data listed for row 1 for 09/01/2015? And if so, are you thinking that you want to match the data based upon calendar date? Can you please clarify?

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 6:24 AM
(4725 views)

And yes, I was thinking of matching the data based upon calendar date. I am trying to see whether any spikes in er visits at certain dates relate to any spikes in PM2.5 dates at the same dates, and use the 2015 data as a control for an air pollution event that happened in 2016. I am not sure how to accomplish this, though.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 7:21 AM
(4719 views)

Given you new information, my previous suggestion is probably invalid. I assumed non matched data. If you consider matching on date as a valid match, then you need to run a Matched Pairs t-test

Analyze==>Specialized Modeling==>Matched Pairs

But previous to that, I would investigate the matching on dates to validate your assumption.

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 8:52 AM
(4705 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 9:48 AM
(4703 views)

If the data table you included in your first description, I have to disagree with your findings. Neither the PM or ER visits have a significant correlation across years. Thus indicating there is not support for matching

Neither RSquare value is significant at the .05 level

Jim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 9:57 AM
(4701 views)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Feb 28, 2017 10:55 AM
(9387 views)

What you need to do is to run a regression with ER as your Y, and PM and Year as your predictors.

To do this,

1. Stack the PM data and the ER data columns. The Stack Platform will allow you to do multiple series stacking.

2. Rename the data column for the PM data to PM, and the ER data column to ER

3. Create a new column. Specify a formula for the new column that strips off the Year value from the column "Label" which was created when stacking the data. Use the formula:

word(-1,:Label)

4. Then go to

Analyze==>Fit Model

5. Specify ER as the Y(Response) column and then place PM and Year in the Model Effects Area

Jim