Categorical variables are frequently observed in clinical trials. For example, ordinal treatment variables with a set of doses (e.g., High Dose, Medium Dose, Low Dose, of Placebo) describe, in decreasing order, the amount of active pharmaceutical ingredient present in the randomized treatment. As another example, an efficacy response to the ongoing exposure of treatment may be labeled as Complete, Partial, or None. In the study of treatments for solid tumors, RECIST criteria define efficacy response using a four-level scale: Complete Response, Partial Response, Stable Disease, Progressive Disease. Further, nominal categorical data may describe important demographic data, such as race and sex, or a site identifier communicating where individual patients are treated throughout the study.
One challenge with categorical variables is that they naturally lead to additional analyses to further characterize study findings. For example, in trials examining multiple doses, the sample size for each treatment arm is often small, which makes it challenging to obtain accurate estimates for uncommon adverse events. For a trial with multiple doses (e.g., High Dose, Medium Dose, Low Dose), it is not uncommon to observe an additional group labeled Active, which combines all patients receiving doses of the novel drug. Tables (and figures) present the following columns.
|
Active
|
High Dose
|
Medium Dose
|
Low Dose
|
Placebo
|
In some especially thorough instances, even more categories may be produced:
|
Active
|
High
|
High-Medium
|
Medium
|
Medium-Low
|
Low
|
Placebo
|
Further, results may be presented by important stratification factors in addition to the overall results.
|
Overall
|
Site 1
|
Site 2
|
|
A
|
H
|
M
|
L
|
P
|
A
|
H
|
M
|
L
|
P
|
A
|
H
|
M
|
L
|
P
|
A similar tendency to group adjacent levels is also of interest for efficacy or safety outcomes, resulting in additional rows in statistical presentations.
|
Complete or Partial
|
|
Complete
|
|
Partial
|
|
None
|
In practice, regroupings of categorical levels can lead to new variables or observations within a data table. New variables require rerunning an analysis to produce summaries with the new grouping, while new rows tend to be more easily handled through the use of By roles. In general, producing tables with columns
|
High Dose
|
Medium Dose
|
Low Dose
|
Placebo
|
or
or with rows
or
is straightforward. However, neither presentation is of interest; the goal is to combine columns or rows within a single display. There are benefits to this approach:
- It eliminates presenting redundant information.
- It eliminates the need to produce additional tables or figures.
- It places data for collapsed categories side by side with the other data of interest.
The downside? Generating these summaries is tricky! Producing statistical output using SAS or R requires looping through an analysis multiple times, switching out variables, deleting redundant information, and reordering data for the most ideal presentation.
JMP, however, makes these analyses straightforward with the availability of the multiple response modeling type. The cells of a multiple response character column represent one or more measurements that effectively function as separate observations or columns, depending on how they are used. It has the benefit of maintaining a straightforward data structure, such as one observation per patient, while allowing for a richer, more informative analysis.
Figure 1 presents a sample data table, entitled Multiple Responses (available for download from the upper right of this blog post). It represents data from 100 patients, one row per patient, with covariates for Site Identifier (1, 2), Age, Biomarker (Positive, Negative), Treatment with four levels (High Dose, Medium Dose, Low Dose, Placebo), and a three-level Response (Complete, Partial, None). Other variables are present and are described below.
Figure 1. Multiple Responses data table
There are two approaches to creating a multiple response column.
- By Formula. The variable Formula Treatment Categories was produced using a formula that produced the multiple responses using an if statement applied to Treatment. The modeling type was selected as multiple response and a Multiple Response column property was added to define a delimiter that is used to separate the cells into separate responses. By default, the delimiter is a comma, which is not practical here since one of the responses includes a comma. A backslash is selected instead. A Value Order column property was also applied to produce a meaningful sort of the values.
- Combine Columns. Multiple columns can be combined into a single column of delimited values and a multiple response column property applied by selecting Cols > Utilities > Combine Columns (Figure 2). The columns High, Medium, Low, High or Medium, and Medium or Low were combined in this way to produce Combine Columns Treatment Categories, which is equivalent to Formula Treatment Categories. In much the same way, Response and Any Response are combined to produce the multiple response column Efficacy Response.
Figure 2. Combine columns
The four embedded scripts of Multiple Responses can be run to produce output from Graph Builder or Tabulate.
For example, the script Treatment Groupings Box Plot produces the analysis in Figure 3. Placing Formula Treatment Categories in the X drop zone effectively creates virtual observations for the different groupings of treatments that are summarized alongside the individual treatment levels. The sample sizes presented in the lower Y axis confirm that the grouped levels coincide with the appropriate sums of the individual levels. The addition of a 5 Number Summary (not shown) provides summary statistics corresponding to each box plot.
Figure 3. Box plots by treatment groupings
Similarly, the embedded script Treatment Groupings Tabulate produces a set of summary statistics for each of the levels present among the multiple response column (Figure 4).
Figure 4. Summary statistics by treatment groupings
Turning our attention to the Efficacy Response column, the embedded script Efficacy Response Bar Chart produces Figure 5, which summarizes the frequency that each response occurs, including the combined levels of Complete and Partial. The sample size labels presented above the bars confirm that the frequency Any response is the sum of the individual levels of response. The script Efficacy Response Tabulate provides a tabular display using the Pack Columns option to provide the format typically found in the scientific literature (Figure 6).
Enterprising individuals can produce their own examples combining Formula Treatment Categories and Efficacy Response.
Figure 5. Frequency of responses by treatment
Figure 6. Frequencies and percentages of responses by treatment
Though the multiple response modeling type is useful, there are instances where its application may not make sense, such as visualization elements that communicate "parts of the whole," such as pie, mosaic, and tree map. For example, the pie chart in Figure 7 produces percentages based on a denominator of 160 due to the virtual addition of 60 observations with Any response. Further, it suggests Any response is a level distinct from Complete or Partial rather than the combination.
Some users may comment that Tabulate and Graph Builder are flexible enough to add multiple axes or variables to different drop zones to accommodate additional groupings without the need for a multiple response column. In many cases, this is absolutely true. However, the beauty of the multiple response column is that it can leverage the same script as traditional categorical columns without the added complexity of using additional features or logic to modify the script according to needs of the analysis. Further, creating separate axes or grouping columns visually isolates the additional columns from the other levels, potentially requiring a rescaling of axes for optimal viewing.
Figure 7. Pie chart of responses
All things considered, the multiple response modeling type is a feature I plan to use more frequently. I am hopeful that these examples provide inspiration for your next analysis!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.