Subscribe Bookmark RSS Feed

How can I set a cell value to "missing" through a script?

wblack0

Community Trekker

Joined:

Sep 22, 2015

I have a script that I use to remove out of family data points from large numeric data sets.  When a value was found that was outside a defined distribution then that value was set to missing by using the code "columnReference[specificCell] = ."

I used to work great but now it seems the script does not like me setting the cell value to "."  Any suggestions on how I can set a cell value to "missing"?

9 REPLIES
ian_jmp

Staff

Joined:

Jun 23, 2011

Feels like something else must have changed in your code. For example, this works:

dt = Open("$SAMPLE_DATA/Big Class.jmp");

hVals = Column(dt, "height") << getValues;

hVals[Loc(hVals > 60)] = .;

Column(dt, "height") << setValues(hVals);

What error do you see?

wblack0

Community Trekker

Joined:

Sep 22, 2015

The only feedback I get from the log is the following.

For( badPt = 1, badPt <= N Row( badDataPoints ), badPt++,

        cleaningColumnRef[badDataPoints[badPt]] = /*###*/. /*###*/

);

markbailey

Staff

Joined:

Jun 23, 2011

BTW, you do not need to explicitly loop through the target rows. You can simply and simultaneously assign missing to them using the multiple subscripts this way:

cleaningColumnRef[badDataPoints] = .;

Learn it once, use it forever!
ms

Super User

Joined:

Jun 23, 2011

It should be straightforward to assign a missing value to a cell.

Maybe the column has been locked, given a formula property or set to character data type?

wblack0

Community Trekker

Joined:

Sep 22, 2015

I double checked column being locked or of the wrong data type.

Thanks for the feedback.  A capability report was added to the data I was analyzing.  I guess since this was referencing the data table it would not allow me to delete individual cells through a script.  It will however let me delete the bad data points manually.  If I close the capability report then my script works again.

Seems like there should be a script function that will let me delete the data even though the capability report is open since I can do it manually.

 

ms

Super User

Joined:

Jun 23, 2011

You can explore mimicking a manual delete with the MainMenu("Delete") command. A warning, save your files before running because a forced "manual" delete will erase whatever is selected in the front window.

Example using your terminology but without a loop:

dt = Open("$SAMPLE_DATA/Big Class.jmp");

badDataPoints = [1, 10];

cleaningColumnRef = Column("height");

dt << Go To(cleaningColumnRef);

dt << select rows(badDataPoints);

dt << Bring Window To Front;

Main Menu("Delete");


msharp

Super User

Joined:

Jul 28, 2015

Just use:

"columnReference[specificCell] = empty()"

rmurphy

Community Trekker

Joined:

May 22, 2012

Does anyone have a recommendation on which approach for executing a number of these types of data cleaning operations has the best performance?

It seems that there are quite a few ways to accomplish these replacements, but for large data sets selecting the fastest may be important.

I used the multiple subscripts approach on a 10,000 column by 750 rows data set, and it took several minutes to replace 0 with missing.

ian_jmp

Staff

Joined:

Jun 23, 2011

I take it from this thread that you would like to do such operations programmatically. It's difficult to give general guidelines because there are so many possibilities. If you are really concerned about efficiency, you can profile your code to see where the bottlenecks are:

http://www.jmp.com/support/help/Debug_or_Profile_Scripts.shtml

In the specific case of replacing 0 with missing, if all the columns in your data table (dt) are numeric continuous, you can do:

m = dt << GetAsMatrix;

m[Loc(m == 0)] = .;

dt2 = AsTable(m);

Then you would have to fix up the column names and properties in dt2 of course.