Share your ideas for the JMP Scripting Unsession at Discovery Summit by September 17th. We hope to see you there!
Choose Language Hide Translation Bar
Highlighted
Bearded
Level I

Help a novice with regression and chi-square! JMP

Hey all!

I am at wits end and could really use a hand completing a couple analyses using JMP. I'm taking a stats class, and unfortunately I'm having a hard time with it.

 

I have attached a cleaned up data file I created from a U.K. Accident data set, and my hypothesis are as follows:

 

1: The number of passengers in an automobile will be positively correlated with the number of casualties caused by the accident

 

2. Male drivers will cause more severe accidents

 

3. A higher number of car passengers will yield more fatal accidents.

 

 

The problem I have centers around the type of analyses to use and how to go about it wiht JMP (As I'm learning this software from scratch.)

 

For the first hypothesis, I started with a correlation, and then attempted a regression and got this:

new1.PNG

Is this correct? I don't quite understand what this data means.

 

For the second hypothesis, I understand that a chi-square analysis is used and I think so far I got that taken care of.

However, for the third analysis I'm really stumped.

 

A higher number of car passengers will yield more fatal accidents. For this I think im attempting to regress a discrete variable (Car passengers) with a ordinal variable (degree of fatality) ?

 

And when I attempt to run a regression on that, I get a bunch of stuff. I'm pretty sure I'm wrong.

 

Could anyone take the time to walk me through this?

Thanks in advance!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
gzmorgan0
Super User

Re: Help a novice with regression and chi-square! JMP

FinalData2.jmp has 136,521 unique records. Accident Index is a character column, it should not be converted to Numeric.  Since some of the indices have characters, such as "UD", when converted to numerics it converts to s large number something like e+128.  FinalData1.jmp has two very important columns Vehicle Reference and Casualty Reference. When an accident report in FinalData2.jmp lists 3 vehicles and 2 casualties there will be 3 additional rows one for each vehicle with reference 1, 2, 3. And there coud be a Casuality in Veficle 1 and 1 in Vehicle 2 and 0 in Vehicle 3.

 

To get the total number of passengers joined with FinalData1.jmp, here are the steps:

  •  Click on FinalData1.jmp. Add a new column, I called it Passengers+Driver use the formula :Car_Passenger + 1
  • From the Main Menu, select Tables > Summary. Select Accident Index and press the Group button, and select the new column Passengers+Driver, press the button Statistics, and select Sum. I entered Passengers to the Output Table Name. Select OK.

image.png

  • Next select FinalData2.jmp. From the main Menu select Tables > Join , select Passengers as the join with table and match on Accident Index. Here you can select which columns you want to keep.

image.png

The joined table is attached and there is a script on the table that shows the bivariate density and correlation.

 

However, neither data table had a column for the sex of the driver.  If you use FinalData2.jmp and select rows where there is only 1 vehicle. Then join with  FinalData1.jmp where there are zero passengers and only 1 fatatlity, then you can assume the sex of the casualty is the sex of the driver and look at that restricted set of accidents driver's sex.  Maybe one of the accident codes assign gender, but nothing is obvious in the tables.  

View solution in original post

4 REPLIES 4
Highlighted
gzmorgan0
Super User

Re: Help a novice with regression and chi-square! JMP

This is called a slightly cleaned data set, but there are data anomalies that need to be exaplined before analyzed:

  • The data table has 161,087 rows. Of those, 81,540 rows had no accident number (36,777 rows have other information, but 44,763 rows are all empty).
  • After deleting the all empty rows, 116,324 rows remain with 36,777 with no Accident #. Also there are numerous rows with weird Accident numbers for example row 45928-45922, there are 92 rows with these extreme numbers.

image.png

  • There are accident numbers with multiple rows. I thought maybe the # of rows assigned to the same accident might match the # of cars involved, but that is not the case. 

So there are many issues with the data. Assuming no duplications and good data, just bad accident numbers, here is one analysis that might address your question #2. 71.44% of all accidents have male drivers, or about 70% for all accident severities.  However, there is no accounting for what percent of drivers are male and female, or frequency of driving. So assuming the data is okay (??) you can only speak to incidence (occurence).   (See below).

 

image.png

 

Final comment, there are numerous accidents involving 1 car, no passengers and numerous casualties and called slight severity?? The data seems strange unless there are numerous cases of single drivers driving into groups of people.

image.png

Highlighted
Bearded
Level I

Re: Help a novice with regression and chi-square! JMP

I have attached the 2 data sets that I combined to get the final working datasheet. I think there were some serious problems with the combination of both. I don't have the skills set to combine them correctly. 

Highlighted
ron_horne
Super User

Re: Help a novice with regression and chi-square! JMP

My best guess is that each row represents one casualty. therefore each row has a unique combination of Accident_Index, Vehicle_Reference, and Casualty_Reference.

if this is the case, you would need to something to bring the data and your hypothesis to a common unit of analysis (i.e. accident or vehicle or casualty). since this is an educational exercise i would change the hypothesis.

you still need to be careful since the fact that the data comes from same accidents means that most likely the assumption of independence of the observations is violated.

 

Highlighted
gzmorgan0
Super User

Re: Help a novice with regression and chi-square! JMP

FinalData2.jmp has 136,521 unique records. Accident Index is a character column, it should not be converted to Numeric.  Since some of the indices have characters, such as "UD", when converted to numerics it converts to s large number something like e+128.  FinalData1.jmp has two very important columns Vehicle Reference and Casualty Reference. When an accident report in FinalData2.jmp lists 3 vehicles and 2 casualties there will be 3 additional rows one for each vehicle with reference 1, 2, 3. And there coud be a Casuality in Veficle 1 and 1 in Vehicle 2 and 0 in Vehicle 3.

 

To get the total number of passengers joined with FinalData1.jmp, here are the steps:

  •  Click on FinalData1.jmp. Add a new column, I called it Passengers+Driver use the formula :Car_Passenger + 1
  • From the Main Menu, select Tables > Summary. Select Accident Index and press the Group button, and select the new column Passengers+Driver, press the button Statistics, and select Sum. I entered Passengers to the Output Table Name. Select OK.

image.png

  • Next select FinalData2.jmp. From the main Menu select Tables > Join , select Passengers as the join with table and match on Accident Index. Here you can select which columns you want to keep.

image.png

The joined table is attached and there is a script on the table that shows the bivariate density and correlation.

 

However, neither data table had a column for the sex of the driver.  If you use FinalData2.jmp and select rows where there is only 1 vehicle. Then join with  FinalData1.jmp where there are zero passengers and only 1 fatatlity, then you can assume the sex of the casualty is the sex of the driver and look at that restricted set of accidents driver's sex.  Maybe one of the accident codes assign gender, but nothing is obvious in the tables.  

View solution in original post

Article Labels

    There are no labels assigned to this post.