Subscribe Bookmark RSS Feed

Creating a boxplot from already summarized values

pcarroll1

Community Trekker

Joined:

Aug 11, 2016

Can a boxplot be created from a table already containing the 5 summarized box and whiskers values?

2 ACCEPTED SOLUTIONS

Accepted Solutions
marie_gaudard

Community Trekker

Joined:

Jan 14, 2015

Solution

As you know, if you have only the five summarized values available, rather than the raw data, you cannot display a boxplot that represents the raw data using Distribution, say.  This is because the quartiles for your five values differ from those of the raw data.  I think that the only way that you can construct a rudimentary boxplot for your unsummarized data using only the five values is by constructing the plot in Graph Builder.

Here is how you can construct a boxplot in Graph Builder.

1.  Open the attached file. It contains summarized boxplot values for the column Weight in the sample data table Body Fat.jmp.

2.  Select Graph > Graph Builder.

3.  Select the five columns from Lower to Upper and drag them to the Y zone.

4.  Drag the Bar Chart element into the plot.  Five bars will appear.

5.  In the panel to the left, there is now a Bar panel.  From the Bar Style list, select Range.  A "box" appears in the plot.

6.  In the Bar panel, click the disclosure icon to the right of Variables.  Select (highligh) Q1 and use the arrows to move it to the top of the list.  Select Q3 and move it to immeidately below Q1.

7.  Once again, drag the Bar Chart element into the plot.  Again, you will see five bars and a second Bar panel will appear to the left.

8.  In the second Bar panel, from the Bar Style list, select Interval.

9.  Click the disclosure icon to the right of Variables and uncheck Q1, Median, and Q3.

10.  In the Points panel (above the two Bar panels), uncheck Jitter.

The script in the attached reproduces this plot.  I hope this helps.

 

 

michael_jmp

Staff

Joined:

Jun 23, 2011

Solution

I think you can use frequencies to trick Graph Builder or Distribution into thinking that the summary data is raw data. If you enter the 5 data points in a data table and then create a second column with frequencies that force your data points to be at the appropriate quantiles. So, far a box plot with endpoints at the 5% and 95% quantiles, you could use the five point summary 3, 15, 20, 40, 42 with frequencies of 5, 20, 25, 25, and 20, respectively. The following JSL gives an example. Just run the data table script to see the boxplot in Graph Builder.

New Table( "Boxplot",
	Add Rows( 5 ),
	New Script(
		"Graph Builder",
		Graph Builder(
			Variables( Y( :Summary Point ), Frequency( :Freq ) ),
			Elements( Box Plot( Y, Legend( 7 ) ) )
		)
	),
	New Column( "Summary Point",
		Numeric,
		"Continuous",
		Format( "Best", 18 ),
		Set Values( [3, 15, 20, 40, 42] )
	),
	New Column( "Freq",
		Numeric,
		"Continuous",
		Format( "Best", 18 ),
		Set Values( [5, 20, 25, 25, 20] )
	)
)
Michael Crotty
Sr Statistical Writer
JMP Development
6 REPLIES
marie_gaudard

Community Trekker

Joined:

Jan 14, 2015

Solution

As you know, if you have only the five summarized values available, rather than the raw data, you cannot display a boxplot that represents the raw data using Distribution, say.  This is because the quartiles for your five values differ from those of the raw data.  I think that the only way that you can construct a rudimentary boxplot for your unsummarized data using only the five values is by constructing the plot in Graph Builder.

Here is how you can construct a boxplot in Graph Builder.

1.  Open the attached file. It contains summarized boxplot values for the column Weight in the sample data table Body Fat.jmp.

2.  Select Graph > Graph Builder.

3.  Select the five columns from Lower to Upper and drag them to the Y zone.

4.  Drag the Bar Chart element into the plot.  Five bars will appear.

5.  In the panel to the left, there is now a Bar panel.  From the Bar Style list, select Range.  A "box" appears in the plot.

6.  In the Bar panel, click the disclosure icon to the right of Variables.  Select (highligh) Q1 and use the arrows to move it to the top of the list.  Select Q3 and move it to immeidately below Q1.

7.  Once again, drag the Bar Chart element into the plot.  Again, you will see five bars and a second Bar panel will appear to the left.

8.  In the second Bar panel, from the Bar Style list, select Interval.

9.  Click the disclosure icon to the right of Variables and uncheck Q1, Median, and Q3.

10.  In the Points panel (above the two Bar panels), uncheck Jitter.

The script in the attached reproduces this plot.  I hope this helps.

 

 

pcarroll1

Community Trekker

Joined:

Aug 11, 2016

Thanks.  You taught us something about the style options that bar plot has.  We will be using this concept to make some homemade representations of our distributions.

marie_gaudard

Community Trekker

Joined:

Jan 14, 2015

Great! I'm happy we could help.
marie_gaudard

Community Trekker

Joined:

Jan 14, 2015

I should have mentioned that I used JMP 13 to create the plot, but it looks as if the procedure works in JMP 12 as well.

michael_jmp

Staff

Joined:

Jun 23, 2011

Solution

I think you can use frequencies to trick Graph Builder or Distribution into thinking that the summary data is raw data. If you enter the 5 data points in a data table and then create a second column with frequencies that force your data points to be at the appropriate quantiles. So, far a box plot with endpoints at the 5% and 95% quantiles, you could use the five point summary 3, 15, 20, 40, 42 with frequencies of 5, 20, 25, 25, and 20, respectively. The following JSL gives an example. Just run the data table script to see the boxplot in Graph Builder.

New Table( "Boxplot",
	Add Rows( 5 ),
	New Script(
		"Graph Builder",
		Graph Builder(
			Variables( Y( :Summary Point ), Frequency( :Freq ) ),
			Elements( Box Plot( Y, Legend( 7 ) ) )
		)
	),
	New Column( "Summary Point",
		Numeric,
		"Continuous",
		Format( "Best", 18 ),
		Set Values( [3, 15, 20, 40, 42] )
	),
	New Column( "Freq",
		Numeric,
		"Continuous",
		Format( "Best", 18 ),
		Set Values( [5, 20, 25, 25, 20] )
	)
)
Michael Crotty
Sr Statistical Writer
JMP Development
pcarroll1

Community Trekker

Joined:

Aug 11, 2016

Thanks.  We did something similar to this by adding addition values of median and quartiles to the data.  The concept of using the frequency term in Graph Builder is definitely a better way to do that.