Generating summary table with p-values from iterative t-test

LinkageBee996 · Mar 13, 2024 01:35 PM

I am looking for a script that will automatically generate a summary/pivoted table of means and p-values for pairwise t-tests compared with one reference group. I am using JMP 17.

In the below data table, there are columns for SubjectId, measured BMI, measured VO2max, year, and Reference to indicate whether or not the row has a SubjectID that is a reference.

I have been generating summary tables manually (like the below) with means and then manually populating p-values for the one-sided t-tests.

I am interested specifically in the one-sided p-value results of a t-test between each SubjectID against the reference for each year to determine whether or not the non-reference SubjectID has a higher BMI or VO2max than the reference SubjectID. So far, I have been manually performing a pooled t-test and reporting the p-value and mean in a summary table I have created. Is there a way to iterate over a list of unique SubjectIDs and generate a summary table with these statistics performed in the way described using JSL? Additionally, a generalized way to add summary statistics (e.g. N for each SubjectID for each analysis) would be helpful.

txnelson · Mar 13, 2024 05:26 PM

I can not figure out what you are doing to the values in your main table, to the new display in your report table. Please detail the steps you are taking to transform the data.

Jim

Moser_stefan · Mar 14, 2024 02:58 PM

Hi @txnelson and @SDF1 , I can hopefully clarify the request and maybe focus the scope a bit. I work with @LinkageBee996 so he asked me to help clarify.

The current workflow is as follows:

1) Start with the data table that @LinkageBee996 shared. To simplify things I'm going to ignore the year column and focus only on BMI.

2) Perform a simple two-sample t-test comparing each of the Subject ID's that are labelled as 'No' for reference to the Subject ID (A1454) that is labelled as 'Yes' for reference for a total of 5 comparisons (there's a reason we're doing this vs. doing something like a dunnetts test). We currently do this one-by-one by using the local data filter in the oneway platform under 'Fit Y by X'

3) Currently the results of the t-test (p-values and such) are then manually transcribed to a blank file (CSV or JMP table). Since we're doing one test at a time, using the 'Make into Data Table' is not practical as you end up with as many data tables as you have comparisons, then would need to combine into single data table.

We'd like to automate this part of the analysis that is future proof for any new subject ID's that we may be testing, so the script should (in whatever order makes the most sense) do the following. This is what makes sense in my head from doing similar analyses in python and other software:

1. Create a list of the 'Non-reference' IDs

2.For each non-reference ID, perform a one-way analysis and two-sample t-test comparing it to the Subject ID that is labelled as 'Yes' for Reference (and let us assume that there will always be exactly one of these)

3. Save the results of the t-test to a data table. One note here: the default formats of 'make into data table' and 'make combined data table' don't save some of the metadata of the comparison - e.g., it'll tell your column header (X is subject ID, Y is BMI) but it won't tell you what the two values for X are like is shown at the top of the t-test results UI in the analysis ( 'A1454-B8565'). We would want to capture this information - you're comparing two things, what are the two things you're comparing?

4. Combine all the t-test results of every comparison in the dataset into a single data table. From there it'll just be a data manipulation problem that we can probably figure out to get it into whatever format is most useful

Hopefully this will provide some more clarity, let us know if there's still ambiguity around this

SDF1 · Mar 18, 2024 02:53 PM

Hi @Moser_stefan and @LinkageBee996 ,

I understand better what it is you're trying to do. Here are a few thoughts:

1. Based on your description, I still think restructuring your data table might be beneficial so that you can take advantage of the BY variable in the ANOVA analysis and then the Make Combined Data Table. It might take some thinking on how to reformat your data table, but I think it might be worthwhile. Even if you end up with many data tables, you can always do a Join. and combine them into a single data table in the end.

2. I think it's a great idea to automate such a process, especially if the analyses are to be repeated with future data sets. In that regard, I highly recommend that you take the time to learn JSL coding on your own. Everyone has their own specific needs when it comes to this kind of thing, and if you can learn JSL to benefit your organization, then it will propel you forward much faster than you can imagine. Scripting solutions to JMP problems makes life easier and makes the time to results much faster. There's a ton of good information out there, especially in the online Discussions forum, but also here. You can download a lot of good JMP documentation here, including the latest Scripting Guide PDF. Plus, there's also the Scripting Index under Help > Scripting Index. As mentioned before, if you're new to coding and scripting, consider using the workflow builder, which will generate the needed script for you. That being said, my recommendation is to learn it yourself so that you have more control/flexibility when it comes to coding your work.

3. Without having your actual data table to work with (even if it's anonymized to protect any sensitive information), it's very hard to generate a script for someone else that will work generically -- that is under "all" circumstances (of course nothing really ever works under all possible circumstances, but the point remains -- make it generalizable). Writing code for others heavily depends on the data table you're working with and the structure of the data within the table because it changes how you might approach the analysis and grouping results, etc.

I know this isn't really the answer you're looking for, but I think that with some time invested in either restructuring your data and/or learning JSL, you will find the payout far exceeds the efforts put in.

Good luck!,

DS

Moser_stefan · Mar 18, 2024 06:34 PM

Thanks for the response @SDF1. The JSL path makes sense, I have some experience with it but certainly no expert so a good opportunity to grow our skills here. Regardless, having your perspective on potential paths to solve this problem is useful to see what some the options are.

One thing that I'm struggling to see if how the 'by' function could be used - mainly because we have one grouping of data (the reference dataset) that we want to re-use in each comparison and one grouping (non-reference) that should be split 'by'. The only way I see this working is if we replicate the reference dataset for each comparison and then give it some unique variable (e.g., a 'comparison' column). Maybe that would work... we'll just need to think about if replicating data in the raw data table creates any other issues or risks.

Well you definitely gave us some food for thought and I appreciate the time you took to respond. We'll be sure to follow-up with whatever solution we come up with.

SDF1 · Mar 19, 2024 11:45 AM

Hi @Moser_stefan and @LinkageBee996 ,

Happy to help out. And, I think I understand a little better with how your data is structured why the BY variable would be a bit difficult. It would likely take considerable restructuring of your data to get that to work out as I thought/intended.

Instead, what you might be better off doing is taking subsets of your data based on the Subject ID column and perhaps also using the Reference column as well. Then, performing your t-test analysis on each of the Reference and Subject IDs of the subset data tables. Then, use the Make Into Data Table and then join the resulting data tables based on a column, for example Column 1 in the Make Into Data Table result. This might be the quickest way to get from your current data format to a solution without having to spend too much time restructuring your data.

In this regard, I made up a mock data table (attached) with a Source column (like your Subject ID) and the several scripts. You can examine the scripts by right clicking and selecting edit, but you can also run them -- run them as grouped by Source A or B, and then run the last script which joins the two data tables together into a final data table. From your description, it sounds like you want to do something like this. At the very least, hopefully the scripts will help and give you a starting point to address your specific needs.

Hope this helps!,

DS

Moser_stefan · Apr 1, 2024 07:52 PM

@SDF1 I just wanted to follow-up and say that we did ultimately solve this. I plan to share the script but I have to hand it to you with some dummy data as we're pulling directly from our databases. I'll outline roughly what I scripted up here. This is fully automated from step 1-9:

1.User input parameters that pass to a 'WHERE' clause in the SQL queries

2.Execute scripts to pull two data tables and combine into single data table:

Reference data (one group)
Test data (many groups)

3.Create two global variables – a list of groups in test data and a list of groups in reference data (this second one is always length 1 for now, I decided to create it so in the future we could possibly have more than one reference group)

4.Pass lists through iterative pooled t-test using 'for' loops, comparing all possible combinations of the reference group with each test group

5.Save each t-test output to data table (use 'make combined data table' feature)

6. Combine each of the combined data tables (there's one for each comparison) into a single data table by appending

7.Clean up the resulting data table

Includes: renaming columns, filtering and sub-setting data, pivoting table, adding metadata such as group IDs

8.Calculate summary statistics by group from original tables (mean, stdev, N)

9.Join summary statistics to p-value table to generate final summary table

Here's a snapshot of the final format of the table.You'll see that we have 7 'test' groups all being compared to a single reference group. Each row provides the p-value of a pooled one-sided t-test for two metrics. I've anonymized the column names and scrambled the data (the p-values are correct but ignore the actual numbers in the mean columns at the end, they won't make sense).

SDF1 · Apr 2, 2024 12:33 PM

Hi @Moser_stefan ,

Glad to hear that you've been able to automate the data collection and analysis. it's always a big help to complete these kinds of tasks as it saves so much time in the long run.

Great work!,

DS

SDF1 · Mar 14, 2024 8:23 AM

Hi @LinkageBee996 ,

I agree with @txnelson , it's difficult to understand how you're getting from your main data table to the one that you're generating manually. It might work better for your purpose to re-arrange your data table and organize it differently. This might require splitting it, or joining the data differently from how it's currently structured.

The reason I mention this is that sometimes analyses can be done using a BY group variable, and then the analysis results can be easily obtained in a larger data table by right clicking a report table and selecting Make Combined Data Table. You can see if this is maybe what would work for you by using the Big Class.jmp sample data and doing an ANOVA Fit Y by X with :sex as the x-axis and :height as the Y (to see better how the make combined data table works, use :age as the BY variable). Perform your t-test, and then right click on the statistic table of interest and select make combined data table.

If you do that for the t-test and also the Means for Oneway Anova, you can then join those data tables so that you have the mean value of each group along with the p-value for the t-test. It should end up looking something like this, which might be along the lines of what you're after.

Of course, you can always save the generated scripts in the data tables to put it all together into a single working script -- or use the Workflow Builder recorder and it can write the script for you, see here.

Hope this helps,

DS

Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test

Re: Generating summary table with p-values from iterative t-test