cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
breino
Level II

How can I automatically adjust bin width in distributions based on the data range itself?

When I use the distribution platform, by default the bin width are always big and do not help gaining a good understanding of the distribution. The data I have to process vary in orders of maginutes (mega, mili, unit, etc) therefore a fixed bin with for all histograms do not work. Also in the real case that I need to process I have hundreds of tests so it is also not feasible to adjust the plots one by one. Is there a way to make the bin width for each individual plot be determined by the data range? For example I would like to simply grab the range and divide it in 100 equal parts or 50, etc. so regardless of the unit the appearance would be somewhat uniform across all plots without having to individually adjust them. Attached are 3 examples of plots where I wish the default would be to see bin width be set as perhaps 1/100 of the data range.

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
breino
Level II

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Thank you @ron_horne and @jthi for the code snippets. I will give them a try.

View solution in original post

7 REPLIES 7
ron_horne
Super User (Alumni)

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Hi @breino ,

start here:

https://community.jmp.com/t5/JMP-Add-Ins/2D-Histograms/ta-p/377945

 

let us know if it helps.

 

breino
Level II

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Hello Ron:

 

Thank you for the suggestion. Unfortunately it does not like a large number of tests, I tried adding ~ 30 tests to be plotted and gives an error of "Cannot add item: list already contains maximum number of items."

 

I was hoping there was a general setting to adjust the bin width since most distributions come out with wide bars so that must be a global setting somewhere on how to determine how wide they are.

 

jthi
Super User

Re: How can I automatically adjust bin width in distributions based on the data range itself?

You could try to script it.

 

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Students.jmp");

colList = {"height", "weight"};

dist = dt << Distribution(
	Continuous Distribution(Column(:height)),
	Continuous Distribution(Column(:weight))
);

wait(2);

rep_layer = Report(dist);
axisbox = rep_layer[axis box(1)];

For Each({val}, colList,
	//some calculation for inc
	newInc = Col Max(Column(dt, val)) - Col Min(Column(dt, val));
	newInc = newInc/10;
	cur_rep = rep_layer[OutlineBox(val)];
	axisbox = cur_rep[axis box(1)];
	axisbox << Inc(newInc);
);

 There is also Bin Span() but I'm not sure if it can be modified after the distribution has been built.

-Jarmo
ron_horne
Super User (Alumni)

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Hi @breino 

how about this as a basis for development:


Open( "$SAMPLE_DATA/Big Class.jmp" );

Distribution(
	Continuous Distribution( Column( :height ), Set Bin Width( (colmax(:height)-colmin(:height))/50 ) ),
	Continuous Distribution( Column( :weight ), Set Bin Width( (colmax(:weight)-colmin(:weight))/100 ) )
	 );
jthi
Super User

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Most likely an improvement to my previous code with the Set Bin Width() from @ron_horne .

Looping can be most likely made in more clever way:

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Students.jmp");

colList = {"height", "weight"};

dist = dt << Distribution(
	Continuous Distribution(Column(:height)),
	Continuous Distribution(Column(:weight))
);

wait(1);

For(i = 1, i <= N Items(colList), i++,
	newInc = Col Max(Column(dt, colList[i])) - Col Min(Column(dt, colList[i]));
	newInc = newInc/10;
	dist[i] << Set Bin Width(newInc);
);
-Jarmo
breino
Level II

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Thank you @ron_horne and @jthi for the code snippets. I will give them a try.

breino
Level II

Re: How can I automatically adjust bin width in distributions based on the data range itself?

I was able to set the  bin with using a portion of the code provided above and using Eval/Parse expressions.

Thanks.

 

 

dt = currentdatatable();

col = dt << get column names();
nc = N Items( col );
colList = {};
bin_width={};

For( i = 1, i <= nc, i++,
	If(Contains (col[i],"::"),
	Insert Into( colList, col[i] ))
);

For(i = 1, i <= N Items(colList), i++,
	Insert Into(bin_width,0.0);
	bin_width[i] = Col Max(Column(dt, colList[i])) - Col Min(Column(dt, colList[i]));
	bin_width[i] = bin_width[i]/10;
);



theExpr = "dis = dt << Distribution(Stack( 1 ),
	Continuous Distribution( column( column(dt,colList[1])), Set Bin Width(bin_width[1]), Label Row( Label Orientation( \!"Perpendicular\!" ) ))
	";


For( i = 2, i < N Items( colList ), i++,
	theExpr = theExpr || ", Continuous Distribution( column( column(dt, colList[" || Char( i ) || "])), Set Bin Width(bin_width[i]))"
);


theExpr = theExpr || ");";

Eval( Parse( theExpr ) );