Hey JMP community,
I currently have a resaerch project, where I am looking at two groups of pts: Lupus pts, and Lupus pt who have asthma. Jmp has a large selection of tests that can be done on any type of data, but I have more of a medical background than a stat background, so figuring out which tests are the right tests have been a little frustarting.
So far, I have been doing pretty simple comparisons such as looking at distributions. I have also been looking at contingency tables for categorical data, as well as the mean, standard deviation, and a t test for the continuous data. All of these comparisons are between two groups: Asthma yes and asthma no. Are these the correct tests I should be running? Or are there other options that I can take? I've also seen an ANOVA test, but doing a little research as made it seem like you only use an ANOVA with 3+ groups.
Also, one of the main things I am trying to do right now it look at lupus flare severity between groups (Each time they are checked they get a SLEDAI, which is a sort of flare indicator). I'm not sure if I should just avg the scores and compare it that way, or somehow make a continuous linear graph that will show flares for individual pts, as well as between the two groups overall.
For a little background, I have medications, labwork, social hx, etc for all of these pts. I know there is a lot I can do, and I have ideas. I am only limited by my lack of stat knowledge.
Any resources who learning stat interpretation would also be appreciated.
You are certianly asking a BIG question! Aside from the approach you have already taken, you might also use Partitioning or Neural Networks to see if you can build some predicitve models which might reveal which factors, in addition to asthma y/n, which might influence the severity of flares. Youmight also treat the two groups separately to determine if different factors are important between groups.
Hope this helps!
I'd actually suggest you not worry so much about the specific test and think more about your question. What are you trying to learn from your data? From there JMP can help guide you through how to get to an answer. If you want to look at the distribution of a single variable it's Analyze > Distribution. If you want to know if there is a relationship between two variables it's Analyze > Fit Y by X. JMP is designed to guide you to an analysis based on what you are trying to understand.
You might also try to have a look at the getting started webinars (On Demand Link). They can help you understand the workflow I'm describing.
And to pile onto my colleague Mike Anderson's suggestions...please whatever you do don't focus on singular 'statistics' such as p values, R (squared), correlation coefficients, t values, Cp values, BIC, etc. and all the other test statistics and measures that lead one to that dreaded disease 'mononumerousis'. Mononumerosis is trying to make a decision on a single value in isolation. Whenever I work with someone who is relatively new to the practice of statistical methods, I give them three pieces of advice:
1. plot the data.
2. Plot The Data.
3. PLOT THE DATA.
Luckily you are using JMP and JMP is very congruent with this notion...if you can't discover what you need in a graphic visualization of the data...whether it's using the elegant JMP Graph Builder or the myriad of graphic visualizations supported in all JMP platforms...well then what you seek probably isn't there to begin with. And oh by the way...when communicating your findings in written or personal face to face fashion with others...use graphs as well...not test statistics or their mononumerotic bretheren.
Could you clarify on not focusing so much on certain statistics? My current hypothesis is that allergic related asthma plays a protective role in lupus. Surely the p-value, t-value, r squared, etc. is important in determining whether to accept the alternative hypothesis? Are you saying I should only focus on the graph interpretations? Means and standard deviations?
My thoughts were to attempt to separate allergic/nonallergic asthma through social histories of smoking, as well as by certain medications. The test statistics are still important in determining significance correct? And sorry, but to ask another quick question, what is the significance of the r-squared in a contingency table? I understand in a linear graph it determines correlation, but I have a hard time interpreting it with the contingency table.
Last follow up question about what you said about presentations. So are you saying, when presenting, that it is acceptable to show a graph to interpret the data without any non-graph numerics shown?
Thank you again.
I think what Peter is trying to emphasize is to not focus on a single statistic. My favorite example of this is when someone says that a model is "bad" because of a low r^2 value even though the p-value is low, suggesting significance. In such a case the r^2 is probably just saying you've got a latent factor or data that's noisier than you thought. Getting hung up on a single statistic is becoming more of an issue. What Peter is saying (and I agree with him) is that you need to look at the plotted data, look at all the statistics, make sure you understand what they are all telling you - Then make a conclusion taking all of it into account.
Now that I've seen your question, I'd actually suggest that you have a look at this webinar on model building (Building Better Models). Since you are looking at multiple factors that may drive both the presence of allergy related asthma and a lupus related response, you are going to be better served by a model where you can look at interactions between factors, etc. I don't think a t-test is going to cut it. My suggestion would be decision trees - specifically the Bootstrap Forest if you have JMP Pro.
Mike Anderson in his second post has restated my message perfectly. The focus on 'key statistics' at the expense of ignoring plotting the data or domain expertise is not recommended. The classic example is Anscombe's Quartet. Which, by the way is actually a data table in the JMP Sample Data Directory. If you open the data table (Anscombe.jmp), run the embedded script, and select Show Points from each scatter plot's hot spot. Note ALL the key statistics like R**2, p-values, t-values, model coefficients, and on and on are identical...yet the plots show VERY different results that would lead to dramatically different ACTIONS a decision maker would take.