Solved: Table Wide Find and Replace in a Script

dhmeakin · Feb 5, 2013 01:29 PM

I have a several large data tables (+100,000 records) that were created by a colleague in Europe.

When the tables were created the EU numbering format was used, i.e. the "," is the decimal point.

I need to do a table wide substitution to replace the "," with a "." in order for the table to work with existing scripts.

I can get the column names with a get column names() statement, but I can not figure out how to use the substitute(...) or substitute into(...) statements to do a replacement on mass.

David

pmroz · Jun 4, 2014 04:44 PM

Here's an example with a large table from the sample dataset. It only has 5800 rows and takes 8 seconds. JMP 11 64-bit on a Lenovo T420. It would be interesting to compare different approaches.

dt = open("$Sample_Data\Probe.jmp");

col_names = dt << get column names (numeric, String);

for (i = 1, i <= nitems(col_names), i++,

column(dt, col_names[i]) << data type(Character);

);

start_dt = today();

nr = nrows(dt);

col_names = dt << get column names (String);

for (i = 1, i <= nitems(col_names), i++,

one_col = column(dt, col_names[i]);

for (k = 1, k <= nr, k++,

one_col[k] = substitute(one_col[k], ".", ","); // Replace decimal with comma

);

end_dt = today();

print(char(end_dt - start_dt) || " seconds for " || char(nr) || " rows");

View solution in original post

jwweaver · Jun 4, 2014 01:52 PM

Hi David,

**After typing all of this up, I realized the original question was posted a LONG time ago. However, my "more efficient" question still remains if anyone can help with that. **

I am trying to figure out this same thing. One solution is to employ a For Loop and working through each column individually, but this can take a considerable amount of time depending on your dataset. Example code is listed below for your reference.

dt = current data table();

For(i=1,i<=ncol(dt),i++,

colname = Column(i) << get name;

newcolname = munger(colname,1,"replacestring",""); // You could do a similar thing with the "Substitute" command

Column(i) << Set Name(newcolname);
);

Does anyone know of a more efficient way than a For Loop to take care of this? For example, when using the standard "Find and Replace" dialog box, this executes very quickly. However, on ~65,000 columns, the For Loop approach takes a considerable amount of time.

Thanks,

Jake

pmroz · Jun 4, 2014 04:44 PM

Here's an example with a large table from the sample dataset. It only has 5800 rows and takes 8 seconds. JMP 11 64-bit on a Lenovo T420. It would be interesting to compare different approaches.

dt = open("$Sample_Data\Probe.jmp");

col_names = dt << get column names (numeric, String);

for (i = 1, i <= nitems(col_names), i++,

column(dt, col_names[i]) << data type(Character);

);

start_dt = today();

nr = nrows(dt);

col_names = dt << get column names (String);

for (i = 1, i <= nitems(col_names), i++,

one_col = column(dt, col_names[i]);

for (k = 1, k <= nr, k++,

one_col[k] = substitute(one_col[k], ".", ","); // Replace decimal with comma

);

end_dt = today();

print(char(end_dt - start_dt) || " seconds for " || char(nr) || " rows");

ms · Jun 4, 2014 05:19 PM

With JMP 11 on a MacBook Pro I get "2 seconds for 5800 rows" or 1.753558 seconds if using HP time( ) instead of Today( ).

In contrast to what's claimed above, the built-in find and replace is much slower here. The code below takes 16.27 seconds to complete (code assumes that the search dialog is "prepared" by a previous manual search).

dt = open("$Sample_Data\Probe.jmp");

start_dt = HP time();

Main Menu( "Replace all" );

end_dt = HP time();

print(char((end_dt - start_dt)/1e6) || " seconds for " || char(nr) || " rows");

jwweaver · Jun 4, 2014 06:34 PM

Maybe I've confused what David was asking based on his comment of "Get column Names". I was under the impression he wants to change column names (not actual row data). In any event, that is what I am trying to do. I want to change the name of multiple columns (i.e., remove "labdata" from every column name). That is what my script above will do. In JSL it takes ~19 sec on 85,000 columns. The "Search Dialog" takes only a fraction of a second.

Any ideas on how to do that without a For Loop (or a super efficient one)?

Thanks,

Jake

ms · Jun 5, 2014 11:08 AM

I tried to invoke the search dialog using the below approach but it is not working reliably. Seems like the dialog get disconnected when settings are edited by script. But maybe someone can find a way to improve the code.

If this unusually wide file originates from outside JMP, it may be more efficient to edit column names before import e.g. by a perl or VBA script that possibly can be triggered from JSL (may depend on platform).

dt= current data table();

old="C";

New="Q";

dt << Bring Window To Front;

// Invoke search dialog and edit settings

Main Menu( "Find" );

w=window("Search Dialog");

w[Texteditbox(1)]<<set text(old);

w[Texteditbox(2)]<<set text(new);

if(w[Check Box Box( 2 )] << get, w[Check Box Box( 2 )] << set(1));

if(w[Check Box Box( 6 )] << get, w[Check Box Box( 6 )] << set(1));

if((w[Check Box Box( 7 )] << get)==0, w[Check Box Box( 7 )] << set(1));

w[buttonbox(3)]<<Click;

//Main Menu( "Replace All" );

pmroz · Jun 5, 2014 02:14 PM

Interesting discussion. I played around with some code and got 49 seconds for a global column rename of 65394 columns. Lenovo T420 PC, Windows 7 64-bit, JMP 64-bit, 4 GB RAM.

dt = open("$sample_data\Probe.jmp");

// Extend the number of columns to 65000

// WARNING takes ~10 minutes!

for (i = 1, i <= 65000, i++,

dt << new column("XYZ" || char(i), Numeric, Continuous);

);

start_dt = today();

col_name_list = dt << get column names(string);

nc = ncols(dt) ;

For (i = 1, i <= nc, i++,

one_col_name = col_name_list[i];

// Only rename the column if it contains the string in question

if (contains(one_col_name, "X"),

column(i) << set name(substitute(one_col_name, "X", "D"));

);

end_dt = today();

print(char(end_dt - start_dt) || " seconds for " || char(nc) || " columns");

jan_akerstrom · Sep 6, 2015 01:34 PM

Hi David,

I had the same but reverse problem - a global replacement of point to comma delimiters in a series of process log files.

I found the easiest way to get out of this was to save each jmp file as a text file, and then import it using JMP: s "best guess" option.

It replaced the original delimiters with the one expected by the system (in my case “.” was replaced with “,” ) and reclassified the columns from categorical to continous. Did one manually and used the source for a script processing the rest.

Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script

Re: Table Wide Find and Replace in a Script