- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Replacing NAs with Randomly generated numbers
Hi,
I'm very new to JMP and I am working with a dataset of measured element concentrations that have a few non-detects labelled as "ND" scattered throughout the dataset. I have been instructed to replace all of the ND's with a value randomly drawn from a uniform distribution
between 0 and the daily limit of detection (LOD) by a two-point calibration. I have the values for the various LODs for each element, I am simply stuck with the problem of implementing the generation of random numbers to populate in place of the NDs.
Any advice is greatly appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Replacing NAs with Randomly generated numbers
Welcome to the Community. Here is a little script that will show you one approach on changing the NA values to a random value.
Names Default To Here( 1 );
LOD = 99; // Change to be the actual LOD value
randomMean = Mean( 0, LOD );
randomSTD = (LOD - randomMean) / 3;
col = "Age"; // Change to the actual column name
For Each Row(
If( As Column( col ) == "NA",
randomVal = LOD + 1; // Set the target value outside of the acceptable range
While( 0 < randomVal | randomVal > LOD, randomVal = Random Normal( randomMean, randomSTD ) );
As Column( col ) = Char( randomVal );
)
);
// Convert column to numeric
Column( col ) << data type( numeric ) << modeling type( continuous );
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Replacing NAs with Randomly generated numbers
You can expect great help from others with implementing your idea. I want to address the validity of the approach in the first place. This scheme is one of many ad hoc approaches without rigor or theoretical support. It will bias the answers. It might seem like selecting a replacement value from a random uniform distribution [0...LOD] would prevent bias. It will not.
A principled, rigorous approach is to tread the LOD as a left-censored observation (i.e., an upper bound on the true value) and use maximum likelihood estimation for the analysis. JMP does not provide the MLE for everything, but it does for many analyses. What kind of analysis were you planning for this data?