cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
Georg
Level VII

Robust fit outliers (huber) excludes all rows when all values are equal, why?

Dear Community,

we are processing large tables and want to exclude outliers.

We found, that the platform robust fit outlier according to huber excludes all rows, unless in the summary there are found no outliers, see picture and sample table script to reproduce (JMP15 or JMP16 on Windows).

Does this really make sense or is there a bug, I would expect a different behaviour.

Georg_0-1632997174770.png

New Table( "Huber-outlier-0-sigma",
	Add Rows( 100 ),
	New Script(
		"Explore Outliers of value",
		Explore Outliers( Y( :value ), Robust Fit Outliers )
	),
	New Column( "value",
		Numeric,
		"Continuous",
		Format( "Best", 12 ),
		Set Values(
			[0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
			0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
			0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
			0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
			0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
			0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
			0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4,
			0.4, 0.4]
		),
		Set Display Width( 83 )
	),
)
Georg
3 REPLIES 3
peng_liu
Staff

Re: Robust fit outliers (huber) excludes all rows when all values are equal, why?

Please contact support@jmp.com to report the problem.

I cannot assess the impact of this oddness to user's production environment. And I cannot speak for the mind of the developer, whether there is a solution.

Here is my view from a numerical angle. In this case, all data are outliers indeed, based on how the outliers are defined and how our computer calculates. The root lies in the floating point arithmetic. There are two issues with this case:

  1. First 0.4 cannot be exactly represented in computer. Imagine 0.4 in computer is 0.40000000000000xx, with some tiny appendage in the end.
  2. Second, Huber Center estimate is not 0.4 exactly either. Even the mean of a bunch of 0.4 is not 0.4 in computer, due to floating point arithmetic as well. Imagine here the Huber Center estimate is 0.40000000000000yy, with some other tiny appendage yy in the end, but different from xx.

Here is the screenshot after I change the format to Fixed with enough length for digits after the decimal place. You can see the appendage in the Huber Center.

peng_liu_0-1633007070249.png

Given the estimate of Huber Spread is zero and all observations are numerically different from the Huber Center, they are indeed light years away from the Huber Center, so they are outliers according to the definition and how they are calculated in our imperfect computer.

Georg
Level VII

Re: Robust fit outliers (huber) excludes all rows when all values are equal, why?

Dear @peng_liu ,

first thank you for your response, I will forward to support@jmp.com.

In my opinion this behaviour is not correct, as far as I understand, the Dialog reports zero outliers (Huber N Outliers = 0), and in contrast to this the exclude button excludes all rows.

Additionally my workaround is, that I test for the range of the column, and if the upper and lower limit is excactly the same, I skip the outlier test (to not have all rows excluded). As this works perfectly now, I conclude that JMP thinks that the numbers are all excactly the same.

Best regards

Georg

Re: Robust fit outliers (huber) excludes all rows when all values are equal, why?

Thank you for reporting this issue to JMP Support. I can confirm that this is being addressed in the next major release. For reference to anyone else stumbles upon this issue, the problem is addressed in JMP 17 (to be released in Fall 2022).