Subscribe Bookmark RSS Feed

Table Wide Find and Replace in a Script

dhmeakin

Community Trekker

Joined:

Jul 28, 2012

I have a several large data tables (+100,000 records) that were created by a colleague in Europe.

When the tables were created the EU numbering format was used, i.e. the "," is the decimal point.

I need to do a table wide substitution to replace the "," with a "." in order for the table to work with existing scripts.

I can get the column names with a get column names() statement, but I can not figure out how to use the substitute(...) or substitute into(...)  statements to do a replacement on mass.

David

1 ACCEPTED SOLUTION

Accepted Solutions
pmroz

Super User

Joined:

Jun 23, 2011

Solution

Here's an example with a large table from the sample dataset.  It only has 5800 rows and takes 8 seconds.  JMP 11 64-bit on a Lenovo T420.  It would be interesting to compare different approaches.

dt = open("$Sample_Data\Probe.jmp");

col_names = dt << get column names (numeric, String);

for (i = 1, i <= nitems(col_names), i++,

    column(dt, col_names[i]) << data type(Character);

);

start_dt = today();

nr = nrows(dt);

col_names = dt << get column names (String);

for (i = 1, i <= nitems(col_names), i++,

    one_col = column(dt, col_names[i]);

    for (k = 1, k <= nr, k++,

        one_col[k] = substitute(one_col[k], ".", ",");    // Replace decimal with comma

    );

);

end_dt = today();

print(char(end_dt - start_dt) || " seconds for " || char(nr) || " rows");

7 REPLIES
jwweaver

Community Trekker

Joined:

Jun 4, 2014

Hi David,

**After typing all of this up, I realized the original question was posted a LONG time ago.  However, my "more efficient" question still remains if anyone can help with that. **

I am trying to figure out this same thing.  One solution is to employ a For Loop and working through each column individually, but this can take a considerable amount of time depending on your dataset.  Example code is listed below for your reference.

dt = current data table();

 

For(i=1,i<=ncol(dt),i++,

     colname = Column(i) << get name;

     newcolname = munger(colname,1,"replacestring",""); // You could do a similar thing with the "Substitute" command

     Column(i) << Set Name(newcolname);
);

Does anyone know of a more efficient way than a For Loop to take care of this?  For example, when using the standard "Find and Replace" dialog box, this executes very quickly.  However, on ~65,000 columns, the For Loop approach takes a considerable amount of time.

Thanks,

Jake

pmroz

Super User

Joined:

Jun 23, 2011

Solution

Here's an example with a large table from the sample dataset.  It only has 5800 rows and takes 8 seconds.  JMP 11 64-bit on a Lenovo T420.  It would be interesting to compare different approaches.

dt = open("$Sample_Data\Probe.jmp");

col_names = dt << get column names (numeric, String);

for (i = 1, i <= nitems(col_names), i++,

    column(dt, col_names[i]) << data type(Character);

);

start_dt = today();

nr = nrows(dt);

col_names = dt << get column names (String);

for (i = 1, i <= nitems(col_names), i++,

    one_col = column(dt, col_names[i]);

    for (k = 1, k <= nr, k++,

        one_col[k] = substitute(one_col[k], ".", ",");    // Replace decimal with comma

    );

);

end_dt = today();

print(char(end_dt - start_dt) || " seconds for " || char(nr) || " rows");

ms

Super User

Joined:

Jun 23, 2011

With JMP 11 on a MacBook Pro I get "2 seconds for 5800 rows" or 1.753558 seconds if using HP time( ) instead of Today( ).

In contrast to what's claimed above, the built-in find and replace is much slower here. The code below takes 16.27 seconds to complete (code assumes that the search dialog is "prepared" by a previous manual search).

dt = open("$Sample_Data\Probe.jmp");

start_dt = HP time();

Main Menu( "Replace all" );

end_dt = HP time();

print(char((end_dt - start_dt)/1e6) || " seconds for " || char(nr) || " rows");

jwweaver

Community Trekker

Joined:

Jun 4, 2014

Maybe I've confused what David was asking based on his comment of "Get column Names".  I was under the impression he wants to change column names (not actual row data).  In any event, that is what I am trying to do.  I want to change the name of multiple columns (i.e., remove "labdata" from every column name).  That is what my script above will do.  In JSL it takes ~19 sec on 85,000 columns.  The "Search Dialog" takes only a fraction of a second.

Any ideas on how to do that without a For Loop (or a super efficient one)?

Thanks,

Jake

ms

Super User

Joined:

Jun 23, 2011

I tried to invoke the search dialog using the below approach but it is not working reliably. Seems like the dialog get disconnected when settings are edited by script. But maybe someone can find a way to improve the code.

If this unusually wide file originates from outside JMP, it may be more efficient to edit column names before import e.g. by a perl or VBA script that possibly can be triggered from JSL (may depend on platform).

dt= current data table();

old="C";

New="Q";

dt << Bring Window To Front;

// Invoke search dialog and edit settings

Main Menu( "Find" );

w=window("Search Dialog");

w[Texteditbox(1)]<<set text(old);

w[Texteditbox(2)]<<set text(new);

if(w[Check Box Box( 2 )] << get, w[Check Box Box( 2 )] << set(1));

if(w[Check Box Box( 6 )] << get, w[Check Box Box( 6 )] << set(1));

if((w[Check Box Box( 7 )] << get)==0, w[Check Box Box( 7 )] << set(1));

w[buttonbox(3)]<<Click;

//Main Menu( "Replace All" );

pmroz

Super User

Joined:

Jun 23, 2011

Interesting discussion.  I played around with some code and got 49 seconds for a global column rename of 65394 columns.  Lenovo T420 PC, Windows 7 64-bit, JMP 64-bit, 4 GB RAM.

dt = open("$sample_data\Probe.jmp");

// Extend the number of columns to 65000

// WARNING takes ~10 minutes!

for (i = 1, i <= 65000, i++,

    dt << new column("XYZ" || char(i), Numeric, Continuous);

);

start_dt = today();

col_name_list = dt << get column names(string);

nc = ncols(dt) ;

For (i = 1, i <= nc, i++,

    one_col_name = col_name_list[i];

// Only rename the column if it contains the string in question

    if (contains(one_col_name, "X"),

        column(i) << set name(substitute(one_col_name, "X", "D"));

    );

);

end_dt = today();

print(char(end_dt - start_dt) || " seconds for " || char(nc) || " columns");

jan_akerstrom

Community Trekker

Joined:

Jan 14, 2015

Hi David,

I had the same but reverse problem - a global replacement of point to comma delimiters in a series of process log files.

I found the easiest way to get out of this was to save each jmp file as a text file, and then import it using JMP: s "best guess" option.

It replaced the original delimiters with the one expected by the system (in my case “.” was replaced with  “,” )  and reclassified the columns from categorical to continous. Did one manually and used the source for a script processing the rest.