cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar

JSL method to get the upper control limits from multivariate outlier analysis

What inspired this wish list request? 

 

Multivariate platform's Outlier Analysis allows you to save Mahalanobis Distances, Jackknife Distances, etc to a column.  It will plot these and show a calculated Upper control limit on those for current alpha - very useful for using the distances to assess whether a point is an outlier.  JMP puts a single value (ie Jackknife Value) into the column properties (ie on the Jackknife Distances column) that is this UCL.  However, if there was a By group, there are distinct UCL values for every group (based on number of non-missing points in the group) and this single value in the column properties appears to just be for the last group.  

 

One might think you could extract them from the platform's report but it doesn't seem like it - they are in Frameboxes, not textboxes in the report and I don't see a way to extract those values.

 

 

 

What is the improvement you would like to see? 

 

In my use case, with a By Group I'd like the UCLs in a column as well.  But what I'd like to see is maybe an ability to ask the platform for those, maybe similar to the way you can ask a fit in Fit Model for various calculated values using  <<GetEstimates, and get a list of those for all the by groups.

 

Related: it would be nice if in cases like this there were a good way to <<Get what the groups are in order to use that list of values better? At least I'm not sure how to do that without parsing it out of the outlinebox names on the report.  Would be nice to get a handy list in same order as the values in what you get from <<getestimates so you know for sure what group each is for.  It has come to my attention that different platforms vary in terms of what groups they will include in their report (or skip due to inadequate data) and the order groups will be in may not be as obvious as it used to be now that JMP will user "numerical ordering" in some cases.

 

 

Why is this idea important? 

 

I think it's clear why the UCLs are useful - it's what you'd compare the distance to in order to see if it's a large distance, for a given confidence?

 

Here's a community post where someone wanted this value, they just weren't using a By I guess because were happy with the single value in column properties:Solved: Jackknife Predicted Values - JMP User Community

 

 

 

6 Comments
jthi
Super User

You can export those values with scripting. This isn't the best method but should show that it is possible

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Solubility.jmp");
dt << New Column("Group", Numeric, Continuous, Formula(
	If(Row() < 20, 1, Row() < 40, 2, 3);
));

mv = dt << Multivariate(
	Y(:Ether, :Chloroform, :Benzene, :Carbon Tetrachloride, :Hexane),
	Variance Estimation("Row-wise"),
	Scatterplot Matrix(1),
	By(:Group)
);

mv << Jackknife Distances(1, Save Jackknife Distances);

window = mv[1] << Top Parent;
fbs = window << XPath("//OutlineBox[text()='Jackknife Distances']//FrameBox");
fbs_of_interest = fbs[2::NItems(fbs)::2];
segs = fbs_of_interest << Find Segs(CustomStreamSeg(1));
seg_scripts = segs << get script;

ucls = {};
For Each({seg_script}, seg_scripts,
	Insert Into(ucls, seg_script[1]["Text"][2]);
);

show(ucls);
hardner
Level VI

Clever!  I had tried to use FindSegs unsuccessfully.  But it's not changing my wish for a way to directly request those values.

hogi
Level XII

Kudo + Kudo for the nice workaround

@jthi how did you find out that  it is CustomStreamSeg?

Is there something like << list segs?

 

hogi_0-1701978311584.png

hogi_1-1701978339313.png

 

 

jthi
Super User

Most likely by guessing and trying different things (I know it is FrameBox -> I know there could be display segments -> go from there (check xml, use << find segs... or just start with xml). Accessing that weird CustomStreamSeg isn't necessary as you can get the UCL value from "earlier" FrameBox accessing its LineSeg

Names Default To Here(1);
dt = Open("$SAMPLE_DATA/Solubility.jmp");
dt << New Column("Group", Numeric, Continuous, Formula(
	If(Row() < 20, 1, Row() < 40, 2, 3);
));

mv = dt << Multivariate(
	Y(:Ether, :Chloroform, :Benzene, :Carbon Tetrachloride, :Hexane),
	Variance Estimation("Row-wise"),
	Scatterplot Matrix(1),
	By(:Group)
);
mv << Jackknife Distances(1, Save Jackknife Distances);

window = mv[1] << Top Parent;
linesegs = window << XPath("//FrameBox[@helpKey = 'Multiv Outlier']/LineSeg[@description='UCL Line']");

ucls = {};
For Each({yvals}, linesegs << Get Y Values,
	Insert Into(ucls, yvals[1]);
);
show(ucls);
hogi
Level XII

Thanks

nice trick to use the XML here as well - I expected it to stop on FrameBox level.

cool that XPath works here as well ...

hardner
Level VI

A couple of other observations about this that might be worth a look:  when you save the Jackknife distances this way (from Multivariate platform's Outlier Analysis) you don't get a formula column although you do if you save Mahalanobis distances or T squared.  The results for Jackknife distances do match the formula described Multivariate Methods documentation but that doesn't result consistently in a missing value for the simple 1-d case where all the other values are the same so their standard deviation is 0, it can result in either missing or a very large value in that case.