cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar

Replacing NAs with Randomly generated numbers

Hi, 

 

I'm very new to JMP and I am working with a dataset of measured element concentrations that have a few non-detects labelled as "ND" scattered throughout the dataset. I have been instructed to replace all of the ND's with a value randomly drawn from a uniform distribution
between 0 and the daily limit of detection (LOD) by a two-point calibration. I have the values for the various LODs for each element, I am simply stuck with the problem of implementing the generation of random numbers to populate in place of the NDs. 

 

Any advice is greatly appreciated!

2 REPLIES 2
txnelson
Super User

Re: Replacing NAs with Randomly generated numbers

Welcome to the Community.  Here is a little script that will show you one approach on changing the NA values to a random value.

Names Default To Here( 1 );

LOD = 99;  // Change to be the actual LOD value
randomMean = Mean( 0, LOD );
randomSTD = (LOD - randomMean) / 3;
col = "Age";   // Change to the actual column name

For Each Row(
	If( As Column( col ) == "NA",
		randomVal = LOD + 1;  // Set the target value outside of the acceptable range
		While( 0 < randomVal | randomVal > LOD, randomVal = Random Normal( randomMean, randomSTD ) );
		As Column( col ) = Char( randomVal );
	)
);

// Convert column to numeric
Column( col ) << data type( numeric ) << modeling type( continuous );
Jim

Re: Replacing NAs with Randomly generated numbers

You can expect great help from others with implementing your idea. I want to address the validity of the approach in the first place. This scheme is one of many ad hoc approaches without rigor or theoretical support. It will bias the answers. It might seem like selecting a replacement value from a random uniform distribution [0...LOD] would prevent bias. It will not.

A principled, rigorous approach is to tread the LOD as a left-censored observation (i.e., an upper bound on the true value) and use maximum likelihood estimation for the analysis. JMP does not provide the MLE for everything, but it does for many analyses. What kind of analysis were you planning for this data?