Solved: How can I automatically adjust bin width in distributions based on the data rang...

breino · Nov 4, 2021 11:25 AM

When I use the distribution platform, by default the bin width are always big and do not help gaining a good understanding of the distribution. The data I have to process vary in orders of maginutes (mega, mili, unit, etc) therefore a fixed bin with for all histograms do not work. Also in the real case that I need to process I have hundreds of tests so it is also not feasible to adjust the plots one by one. Is there a way to make the bin width for each individual plot be determined by the data range? For example I would like to simply grab the range and divide it in 100 equal parts or 50, etc. so regardless of the unit the appearance would be somewhat uniform across all plots without having to individually adjust them. Attached are 3 examples of plots where I wish the default would be to see bin width be set as perhaps 1/100 of the data range.

Thanks!

breino · Nov 4, 2021 06:38 PM

Thank you @ron_horne and @jthi for the code snippets. I will give them a try.

View solution in original post

ron_horne · Nov 4, 2021 11:40 AM

Hi @breino ,

start here:

https://community.jmp.com/t5/JMP-Add-Ins/2D-Histograms/ta-p/377945

let us know if it helps.

breino · Nov 4, 2021 12:19 PM

Hello Ron:

Thank you for the suggestion. Unfortunately it does not like a large number of tests, I tried adding ~ 30 tests to be plotted and gives an error of "Cannot add item: list already contains maximum number of items."

I was hoping there was a general setting to adjust the bin width since most distributions come out with wide bars so that must be a global setting somewhere on how to determine how wide they are.

jthi · Nov 4, 2021 01:06 PM

You could try to script it.

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Students.jmp");

colList = {"height", "weight"};

dist = dt << Distribution(
	Continuous Distribution(Column(:height)),
	Continuous Distribution(Column(:weight))
);

wait(2);

rep_layer = Report(dist);
axisbox = rep_layer[axis box(1)];

For Each({val}, colList,
	//some calculation for inc
	newInc = Col Max(Column(dt, val)) - Col Min(Column(dt, val));
	newInc = newInc/10;
	cur_rep = rep_layer[OutlineBox(val)];
	axisbox = cur_rep[axis box(1)];
	axisbox << Inc(newInc);
);

There is also Bin Span() but I'm not sure if it can be modified after the distribution has been built.

-Jarmo

ron_horne · Nov 4, 2021 01:08 PM

Hi @breino

how about this as a basis for development:


Open( "$SAMPLE_DATA/Big Class.jmp" );

Distribution(
	Continuous Distribution( Column( :height ), Set Bin Width( (colmax(:height)-colmin(:height))/50 ) ),
	Continuous Distribution( Column( :weight ), Set Bin Width( (colmax(:weight)-colmin(:weight))/100 ) )
	 );

jthi · Nov 4, 2021 01:16 PM

Most likely an improvement to my previous code with the Set Bin Width() from @ron_horne .

Looping can be most likely made in more clever way:

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Students.jmp");

colList = {"height", "weight"};

dist = dt << Distribution(
	Continuous Distribution(Column(:height)),
	Continuous Distribution(Column(:weight))
);

wait(1);

For(i = 1, i <= N Items(colList), i++,
	newInc = Col Max(Column(dt, colList[i])) - Col Min(Column(dt, colList[i]));
	newInc = newInc/10;
	dist[i] << Set Bin Width(newInc);
);

-Jarmo

breino · Nov 4, 2021 06:38 PM

Thank you @ron_horne and @jthi for the code snippets. I will give them a try.

breino · Dec 7, 2021 08:38 PM

I was able to set the bin with using a portion of the code provided above and using Eval/Parse expressions.

Thanks.

dt = currentdatatable();

col = dt << get column names();
nc = N Items( col );
colList = {};
bin_width={};

For( i = 1, i <= nc, i++,
	If(Contains (col[i],"::"),
	Insert Into( colList, col[i] ))
);

For(i = 1, i <= N Items(colList), i++,
	Insert Into(bin_width,0.0);
	bin_width[i] = Col Max(Column(dt, colList[i])) - Col Min(Column(dt, colList[i]));
	bin_width[i] = bin_width[i]/10;
);



theExpr = "dis = dt << Distribution(Stack( 1 ),
	Continuous Distribution( column( column(dt,colList[1])), Set Bin Width(bin_width[1]), Label Row( Label Orientation( \!"Perpendicular\!" ) ))
	";


For( i = 2, i < N Items( colList ), i++,
	theExpr = theExpr || ", Continuous Distribution( column( column(dt, colList[" || Char( i ) || "])), Set Bin Width(bin_width[i]))"
);


theExpr = theExpr || ");";

Eval( Parse( theExpr ) );

How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?

Re: How can I automatically adjust bin width in distributions based on the data range itself?