cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
See how to use JMP Live to centralize and share reports within groups. Webinar with Q&A April 4, 2pm ET.
Choose Language Hide Translation Bar
View Original Published Thread

Replacing NAs with Randomly generated numbers

LossEffectPuma9
Level I

Hi, 

 

I'm very new to JMP and I am working with a dataset of measured element concentrations that have a few non-detects labelled as "ND" scattered throughout the dataset. I have been instructed to replace all of the ND's with a value randomly drawn from a uniform distribution
between 0 and the daily limit of detection (LOD) by a two-point calibration. I have the values for the various LODs for each element, I am simply stuck with the problem of implementing the generation of random numbers to populate in place of the NDs. 

 

Any advice is greatly appreciated!

2 REPLIES 2
txnelson
Super User


Re: Replacing NAs with Randomly generated numbers

Welcome to the Community.  Here is a little script that will show you one approach on changing the NA values to a random value.

Names Default To Here( 1 );

LOD = 99;  // Change to be the actual LOD value
randomMean = Mean( 0, LOD );
randomSTD = (LOD - randomMean) / 3;
col = "Age";   // Change to the actual column name

For Each Row(
	If( As Column( col ) == "NA",
		randomVal = LOD + 1;  // Set the target value outside of the acceptable range
		While( 0 < randomVal | randomVal > LOD, randomVal = Random Normal( randomMean, randomSTD ) );
		As Column( col ) = Char( randomVal );
	)
);

// Convert column to numeric
Column( col ) << data type( numeric ) << modeling type( continuous );
Jim


Re: Replacing NAs with Randomly generated numbers

You can expect great help from others with implementing your idea. I want to address the validity of the approach in the first place. This scheme is one of many ad hoc approaches without rigor or theoretical support. It will bias the answers. It might seem like selecting a replacement value from a random uniform distribution [0...LOD] would prevent bias. It will not.

A principled, rigorous approach is to tread the LOD as a left-censored observation (i.e., an upper bound on the true value) and use maximum likelihood estimation for the analysis. JMP does not provide the MLE for everything, but it does for many analyses. What kind of analysis were you planning for this data?