Solved: Re: Comparing means for implicit data in JMP

mikethejumper · Jan 8, 2014 04:02 PM

Hi all,

I have a data table with columns showing frequency of events (in my case skiing accidents in different regions in the united states). The first column contains the region names. I want to compare means of Males and Females (Males and Females are another two columns in the data table) who had skiing accidents irrespective of region. I know FIT Y by X can do this, but I don't have a Y as all the numbers I have represents the number of cases. Is there a way I can compare means between Males and Females?

Thanks in advance.

Mike

Jeff_Perkinson · Jan 9, 2014 04:10 PM

Hi Mike,

You need to use Tables -> Stack to get a data table similar to mine with a column for Region, Gender and Accidents. Then you can use Fit Y by X to get your analysis.

Start with your table and in the Stack dialog add the Male and Female columns to Stack Columns and then name the Stacked Data Column "Accidents" and the Source Label Column "Gender".

You'll get a table like this.

Then you can use Fit Y by X with Gender as your X and Accidents as your Y.

You'll find the analysis options under the Red Triangle at the top.

Let us know how you make out.

-Jeff

View solution in original post

Jeff_Perkinson · Jan 9, 2014 03:02 PM

I think you're saying that your data looks something like this:

Mountain	Gender	Accidents
Snowmass	Male	5
Alta	Male	10
Breckenridge	Male	5
Copper Basin	Male	4
Snowmass	Female	7
Alta	Female	2
Breckenridge	Female	9
Copper Basin	Female	7

I'm not quite clear on the analysis you want to do but you can use the Accidents column in the Freq role to indicate that this column is a count of how many times this row occurs.

Let me know if I've misinterpreted how your data is laid out, and if you can clarify what question you're trying to answer we can try to point you to the appropriate analysis.

-Jeff

mikethejumper · Jan 9, 2014 03:28 PM

Thank Jeff. I apologize for the non-clarity in the question. My data looks something like this. All the numbers inside the table represents number of accidents reported.

Region	Male	Female
Alaska	25	5
Wisconsin	10	4
Illinois	5	3
NYC	2	2
Detroit	9	1
Jersey city	15	0

Now, I would like to see if the mean of male accidents is significantly different from the mean of the female accidents, irrespective of region. The problem is I don't have a Y variable as the number of variables are implicitly embedded in the table, in other words, I don't have an explicit "accidents" column in the table. I hope I am little bit clear this time:) . By profession I am an engineer, apologies for my illiteracy in Stats.

Mike.

ms · Jan 9, 2014 03:54 PM

With that layout you can try the Matched Pairs analysis platform (Add Male and Female columns as Y). However, you'll have more options if you stack the columns (Stack in Tables menu). Then you can use the Fit Y by X platform to compare means with Gender as X and the count data as Y. The variance appears higher for for males so you may want to look at a nonparametric method which are found in the red triangle menu in the Fit Y by X results window.

Here's an example script that does the above (paste into a script window and hit run!):

// Example table

dt = New Table( "Accidents",

Add Rows( 6 ),

New Column( "Region",

Character,

Nominal,

Set Values( {"Alaska", "Wisconsin", "Illinois", "NYC", "Detroit", "Jersey city"} )

),

New Column( "Male",

Numeric,

Continuous,

Set Values( [25, 10, 5, 2, 9, 15] )

),

New Column( "Female",

Numeric,

Continuous,

Set Values( [5, 4, 3, 2, 1, 0] )

)

);

// Stack table

dt_stacked = dt << Stack(

columns( :Male, :Female ),

Source Label Column( "Gender" ),

Stacked Data Column( "N Accidents" )

);

// Compare means

dt_stacked << Oneway( Y( :N Accidents ), X( :Gender ), t Test( 1 ), Wilcoxon Test( 1 ) );

reeza · Jan 9, 2014 04:04 PM

Wouldn't you just be comparing two numbers then, the total number of males vs females, if you're not interested in region?

This is a flawed analysis though, because you need the number of accidents per skiers-day really otherwise busier hills will always have more accidents and generally more males ski so there will be more male accidents.

mikethejumper · Jan 9, 2014 05:36 PM

Jeff,

Thanks for the input. I was not aware of such a powerful and robust command !

Mike

Reeza,

Thanks for the input. Agree, the data must be normalized to a characteristic quantity to get a sensible prediction. thanks.

Mike

Jeff_Perkinson · Jan 9, 2014 04:10 PM

Hi Mike,

You need to use Tables -> Stack to get a data table similar to mine with a column for Region, Gender and Accidents. Then you can use Fit Y by X to get your analysis.

Start with your table and in the Stack dialog add the Male and Female columns to Stack Columns and then name the Stacked Data Column "Accidents" and the Source Label Column "Gender".

You'll get a table like this.

Then you can use Fit Y by X with Gender as your X and Accidents as your Y.

You'll find the analysis options under the Red Triangle at the top.

Let us know how you make out.

-Jeff