Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- JMP User Community
- :
- Discussions
- :
- Data QA and Reformatting

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Aug 24, 2020 2:13 PM
(666 views)

Hi all,

A common slow down procedure in my workflow is that a client is sending us a spreadsheet with results containing '< 0.01' or some other arbitrary number for samples that are below a detection limit. We have an internal R code that calculates imputation for these values, but we need to reformat the data set (example attached here). From the file we need to create two separate CSVs, one with non-detects replaced with zero (Detect file attached) and another one where all the values are zeros and the non-detects are replaced with the detection limit without the '<' symbol (Non-Detect attached file).

Does anyone know how to automate this process to create these two CSVs from the data set? Would be thrilled if JMP has this functionality or if anyone could provide a general script.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

This is a fairly simple task for JMP. Run this code (after editing the <Your Path> sections) and see if it gives you what you want. Then you need to go to the JMP Documentation Library, and read the Scripting Guide, so you can understand what the code below is doing

```
Names Default To Here( 1 );
// Open the Excel spreadsheet that has the issues
dt = Open( "<your path>\Sample Client Data.xlsx" );
// Get all of the character columns
colNames = dt << get column names( string, character );
// Go across all of the character columns, and if "<0.0" is found in one
// or more cells, then change those cells to 0. Then convert the column
// to a numeric column
For( i = 1, i <= N Items( colNames ), i++,
dt << select where( Contains( As Column( colNames[i] ), "<0.0" ) );
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
Column( dt, colNames[i] )[selRows] = "0";
Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous );
);
);
// Save the Detect version of the data
dt << save( "<your path>\Detect.csv" );
Close( dt, nosave );
// Reopen the Excel spreadsheet that has issues
dt = Open( "<your path>\Sample Client Data.xlsx" );
// Get all of the character columns
colNames = dt << get column names( string, character );
// Go across all of the character columns, and if "<0.0" is found in one
// or more cells, then loop through the cells getting rid of the "<"
// Then convert the column to a numeric column
For( i = 1, i <= N Items( colNames ), i++,
dt << select where( Contains( As Column( colNames[i] ), "<0.0" ) );
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
For( k = 1, k <= N Rows( selRows ), k++,
theRow = selRows[k];
Column( dt, colNames[i] )[theRow] = Substr( Column( dt, colNames[i] )[theRow], 2 );
);
Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous );
);
);
// Save the NonDetect version of the data
dt << save( "<your path>\NonDetect.csv" );
Close( dt, nosave );
```

Jim

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Ah, I see the issue.....and I have corrected the issue below.....I believe. But first, let me clarify something first. When JMP reads in the different columns, it read through all of the data for a given column, and decides if the column is a numeric or a character column. All values in a column need to be numeric values for the column to be considered numeric. Therefore, when JMP finds a column with a non numeric cell, i.e <0.0005, it declares the column to be a character column. My code takes advantage of that, in that it allow the code to find the columns where a "<" has been found. It also allows for JSL to use character manipulation functions such as Substr() to get rid of the "<" and to leave a valid numeric in the cell. But then, once all of the "<" are removed, the column can then be changed to a numeric column......<< data type( numeric ). Here is my new code, converting all of the columns it finds as being numeric right after the data has been read in, and before the "<" columns have been processed, to 0. I am assuming you want to change all values in a numeric column to 0, even if it was a negative value. The code then goes on to correct the "<" columns, which changes all numeric cells to 0 except for the cells that had "<".

```
dt = Open( "C:\users\jim\documents\discussion group\Sample Client Data.xlsx" );
// Get all of the character columns
colNames = dt << get column names( string, character );
colNamesNum = dt << get column names( string, numeric );
// zero out all current numeric columns
For( i = 2, i <= N Items( colNamesNum ), i++,
column( dt, colNamesNum[i] )[1::N Rows(dt)] = 0;
);
// Go across all of the character columns, and if "<0.0" is found in one
// or more cells, then loop through the cells getting rid of the "<"
// Then convert the column to a numeric column
For( i = 1, i <= N Items( colNames ), i++,
dt << select where( ! Contains( As Column( colNames[i] ), "<0.0" ) );
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
column( dt, colNames[i] )[selRows] = "0";
);
dt << invert row selection;
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
For( k = 1, k <= N Rows( selRows ), k++,
theRow = selRows[k];
Column( dt, colNames[i] )[theRow] = Substr( Column( dt, colNames[i] )[theRow], 2 );
);
Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous );
);
);
// Save the NonDetect version of the data
dt << save( "<your path>\NonDetect.csv" );
Close( dt, nosave );
```

Jim

6 REPLIES 6

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

This is a fairly simple task for JMP. Run this code (after editing the <Your Path> sections) and see if it gives you what you want. Then you need to go to the JMP Documentation Library, and read the Scripting Guide, so you can understand what the code below is doing

```
Names Default To Here( 1 );
// Open the Excel spreadsheet that has the issues
dt = Open( "<your path>\Sample Client Data.xlsx" );
// Get all of the character columns
colNames = dt << get column names( string, character );
// Go across all of the character columns, and if "<0.0" is found in one
// or more cells, then change those cells to 0. Then convert the column
// to a numeric column
For( i = 1, i <= N Items( colNames ), i++,
dt << select where( Contains( As Column( colNames[i] ), "<0.0" ) );
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
Column( dt, colNames[i] )[selRows] = "0";
Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous );
);
);
// Save the Detect version of the data
dt << save( "<your path>\Detect.csv" );
Close( dt, nosave );
// Reopen the Excel spreadsheet that has issues
dt = Open( "<your path>\Sample Client Data.xlsx" );
// Get all of the character columns
colNames = dt << get column names( string, character );
// Go across all of the character columns, and if "<0.0" is found in one
// or more cells, then loop through the cells getting rid of the "<"
// Then convert the column to a numeric column
For( i = 1, i <= N Items( colNames ), i++,
dt << select where( Contains( As Column( colNames[i] ), "<0.0" ) );
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
For( k = 1, k <= N Rows( selRows ), k++,
theRow = selRows[k];
Column( dt, colNames[i] )[theRow] = Substr( Column( dt, colNames[i] )[theRow], 2 );
);
Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous );
);
);
// Save the NonDetect version of the data
dt << save( "<your path>\NonDetect.csv" );
Close( dt, nosave );
```

Jim

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Data QA and Reformatting

Hi Jim,

I've looked this over and it's missing the second part of the task. We need a script for the 'Non Detect' to also change all the values into zeros.

I want to run the loop, select all the cells that have values > 0 (excluding the ones that have "<0.0" DL values).

The last part of the code you sent works great, I just need to insert 'NonDetectPreprocess.csv' into it.

I've tried using this custom script I made:

// Reopen the Excel spreadsheet that has issues dt = Open( "D:\MD\Datto Workplace\Projects\Internal\JMP Scripts\Zero Imputation\Fake Client Data.csv" ); // Go across all of the character columns, and if concentration is greater than zero is found in one // or more cells, then change those cells to 0. Then convert the column // to a numeric column For( i = 1, i <= N Items( colNames ), i++, dt << select where( dt > 0 ); selRows = dt << get selected rows; If( N Rows( selRows ) > 0, Column( dt, colNames[i] )[selRows] = "0"; Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous ); ); ); //Save this sheet for next step dt << save("D:\MD\Datto Workplace\Projects\Internal\JMP Scripts\Zero Imputation\NonDetectPreprocess.csv" ) Close( dt, nosave );

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Data QA and Reformatting

Good try....but I think this is what you want

```
// Reopen the Excel spreadsheet that has issues
dt = Open( "<your path>\Sample Client Data.xlsx" );
// Get all of the character columns
colNames = dt << get column names( string, character );
// Go across all of the character columns, and if "<0.0" is found in one
// or more cells, then loop through the cells getting rid of the "<"
// Then convert the column to a numeric column
For( i = 3 i <= N Items( colNames ), i++,
dt << select where( ! Contains( As Column( colNames[i] ), "<0.0" ) );
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
column( dt, colNames[i] )[selRows] = "0";
);
dt << invert row selection;
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
For( k = 1, k <= N Rows( selRows ), k++,
theRow = selRows[k];
Column( dt, colNames[i] )[theRow] = Substr( Column( dt, colNames[i] )[theRow], 2 );
);
Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous );
);
);
// Save the NonDetect version of the data
dt << save( "<your path>\NonDetect.csv" );
Close( dt, nosave );
```

Jim

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Data QA and Reformatting

Hi Jim,

I noticed you did an inverted row selection. Although this helps, the problem is that only the columns that contained the, "<0.0", are selected. How do I select all the columns that have numeric data in the data set and change those?

Thanks for the help!

Highlighted

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Ah, I see the issue.....and I have corrected the issue below.....I believe. But first, let me clarify something first. When JMP reads in the different columns, it read through all of the data for a given column, and decides if the column is a numeric or a character column. All values in a column need to be numeric values for the column to be considered numeric. Therefore, when JMP finds a column with a non numeric cell, i.e <0.0005, it declares the column to be a character column. My code takes advantage of that, in that it allow the code to find the columns where a "<" has been found. It also allows for JSL to use character manipulation functions such as Substr() to get rid of the "<" and to leave a valid numeric in the cell. But then, once all of the "<" are removed, the column can then be changed to a numeric column......<< data type( numeric ). Here is my new code, converting all of the columns it finds as being numeric right after the data has been read in, and before the "<" columns have been processed, to 0. I am assuming you want to change all values in a numeric column to 0, even if it was a negative value. The code then goes on to correct the "<" columns, which changes all numeric cells to 0 except for the cells that had "<".

```
dt = Open( "C:\users\jim\documents\discussion group\Sample Client Data.xlsx" );
// Get all of the character columns
colNames = dt << get column names( string, character );
colNamesNum = dt << get column names( string, numeric );
// zero out all current numeric columns
For( i = 2, i <= N Items( colNamesNum ), i++,
column( dt, colNamesNum[i] )[1::N Rows(dt)] = 0;
);
// Go across all of the character columns, and if "<0.0" is found in one
// or more cells, then loop through the cells getting rid of the "<"
// Then convert the column to a numeric column
For( i = 1, i <= N Items( colNames ), i++,
dt << select where( ! Contains( As Column( colNames[i] ), "<0.0" ) );
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
column( dt, colNames[i] )[selRows] = "0";
);
dt << invert row selection;
selRows = dt << get selected rows;
If( N Rows( selRows ) > 0,
For( k = 1, k <= N Rows( selRows ), k++,
theRow = selRows[k];
Column( dt, colNames[i] )[theRow] = Substr( Column( dt, colNames[i] )[theRow], 2 );
);
Column( dt, colNames[i] ) << data type( numeric ) << modeling type( continuous );
);
);
// Save the NonDetect version of the data
dt << save( "<your path>\NonDetect.csv" );
Close( dt, nosave );
```

Jim

Highlighted
##

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Data QA and Reformatting