This blog was authored by Shuying Han. Shuying Han developed the UpSet Plot add-in as part of the Biostatistics Undergraduate Summer Internship (BUSI) program at the University of North Carolina at Chapel Hill.

Introduction
Nowadays, statistical studies and clinical trials aim to uncover and communicate the stories within data, transforming raw numbers into meaningful insights. Data visualization serves as a crucial bridge to effectively present and communicate these findings.
Inspired by the work of Ballarini et al. (2020), we adopted the UpSet plot as a robust data visualization tool to explore and analyze categorical data sets. UpSet plots offer a clear and scalable approach to visualizing intersections among multiple variables, allowing users to dynamically select grouping and factor variables for customized and interactive exploration. This approach makes them particularly well-suited for uncovering patterns and relationships in complex, multidimensional data sets. Unlike traditional Venn diagrams, UpSet plots provide a more structured and intuitive way to analyze overlaps, especially in the context of clinical trials, where complexity often increases with the volume of data collected.
Example using Nicardipine.jmp
In this blog, we utilize a subset of the sample data set Nicardipine.jmp (available in JMP sample data), which captures patient information (N=882) from a clinical trial involving Nicardipine treatment after removing rows with duplicate subject identifiers. It includes key patient demographics, treatment protocols, adverse event reporting, clinical outcomes, and study metadata – offering a comprehensive view of the variables typically collected in a clinical trial study, making it valuable for subgroup analyses.
(Note: To obtain a patient-level summary of data for Nicardipine.jmp, select the Unique Subject Identifier column, go to Rows > Row Selection > Select Duplicate Rows. Delete these duplicate selected rows – now there are 882 rows, one per patient. The columns Domain Abbreviation and Sequence Number to Study Day of End of Adverse Event are AE-level data and can be deleted or ignored.)
To analyze the relationship between Sex (Male, Female) and various clinical intervention flags, such as Swan-Ganz Monitoring ("Y" for Yes, "N" for No), Steroids ("Y" for Yes, "N" for No), and Intentional Hemodilution ("Y" for Yes, "N" for No), we present the UpSet plot dialog in Figure 1.
Figure 1. Sample UpSet plot dialog
After inputting the target factor variables and grouping variable, the UpSet Plot Add-In generated the corresponding UpSet plot (Figure 2).
Figure 2. Sample UpSet plot
Overview of the UpSet plot
- The stacked bar chart at the top represents the proportion of patients for each Sex (Male, Female) as the selected grouping variable in the dialog (Figure 1) for different intervention flag subgroups.
- The middle bar chart displays the total number of patients (N per Subgroup) in each unique intersection of intervention flags. The right bar chart displays the total number of patients for each factor level. For example, 456 patients (the fourth bar) have Intentional Hemodilution Flag = N and Steroid Flag = Y (dots are present and connected in this column). Within this group, 38% and 63% of these 456 patients are male and female, respectively (hovering over these bars will give exact percentages).
- The bottom part of the graph with line segments displays the individual, pairwise, and three-way subgroup combinations based on the presence and absence of each intervention flag ("Y" for Yes, "N" for No). The dots illustrate the characteristics of each subgroup.
Insights from the UpSet plot
- Grouping factor proportions: Based on the stacked bar chart above, the female group (F) consistently dominates across all subgroup combinations, ranging from 56%-70%. This relatively imbalanced gender distribution may affect the results of studies where gender is a key factor.
- Subgroup size: The right bar chart shows variability in the sizes of each combination of intervention flags, with the largest group containing 643 individuals and others decreasing to 625, 489, 293, and 257. The Swan-Ganz Monitoring flag has relatively balanced representation, while others show uneven distribution, suggesting differences in intervention utilization or patient characteristics. Depending on the study's goals, further analysis of pairwise and three-way subgroup interactions may reveal key trends.
Conclusion
The UpSet plot provides an efficient and intuitive way to visualize intersections and relationships within multidimensional data sets. Using the subset of the Nicardipine.jmp trial data set subset, we demonstrated how this visualization tool effectively highlights the interplay between Sex and clinical intervention flags. By enabling dynamic exploration of subgroup patterns, the UpSet plot simplifies large, complex data and delivers actionable insights, making it an invaluable resource for clinical trial analysis and decision making.
For more details and examples from the add-in, please review the UpSet Plot Add-In documentation. If you like the add-in, please give us a five-star review! New features can be requested in the Q&A.
References
Ballarini NM, Chiu Y, König F, Posch M, & Jaki T. (2020). A critical review of graphics for subgroup analyses in clinical trials. Pharmaceutical Statistics, 19: 541-560.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.