cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
  • JMP 19 is here! See the new features at jmp.com/new.
  • Register to attend Discovery Summit 2025 Online: Early Users Edition, Sept. 24-25.
Choose Language Hide Translation Bar

Quickest way to iterate over a data table

Hello,

We have a JMP table with millions of rows (image data), and each operation is costly and time-consuming. However, we need to add a new column that will be calculated from the other columns.

What is the fastest way to do this with large data sets? Currently, we are using Set Formula on the column, but it takes a long time to apply and, what's more, the formula is bound to be changed several times during the analysis. Should we use ‘get values’ vectors or something like that? I have also seen the ‘for each rows’ function. I would appreciate any feedback and tips you may have.
3 REPLIES 3
txnelson
Super User

Re: Quickest way to iterate over a data table

Many times, moving away from conventional row by row JSL in favor of using matrix operations will result in faster processing.  However, it would be helpful if you could provide an example of the actual formulas you are using.

Jim

Re: Quickest way to iterate over a data table

Hello, 

 

Thank you for your response.

To add more context: Below is a script with an example of the formula I use. Based on the X/Y coordinates on the image, I divide it into zones, 20 in total on my real date. If the classification of the zones is not correct, I will modify the formula, which is why I say that we will frequently use set formulas.

 

dt = New Table( "Reprex",
    Add Rows( 1000000 ),
    New Column( "X", Numeric, Continuous, Formula( Random Integer( 0, 200 ) ) ),
    New Column( "Y", Numeric, Continuous, Formula( Random Integer( 0, 200 ) ) )
);

R_debut_X = 10;
largeur   = 50;
ecart_X   = 5;
debut_Y   = 20;
hauteur   = 30;
ecart_Y   = 5;

// Slow way: row formula
dt << New Column( "Zone_slow", Character,
    Formula(
        If( R_debut_X < :X <= R_debut_X + largeur & debut_Y < :Y <= debut_Y + hauteur, "Zone 1",
            R_debut_X + largeur + ecart_X < :X <= R_debut_X + 2*largeur + ecart_X & debut_Y < :Y <= debut_Y + hauteur, "Zone 2",
            "Other"
        )
    )
);

 

Craige_Hales
Super User

Re: Quickest way to iterate over a data table

@txnelson is right, a matrix might improve the speed. Make sure you are using <<runformulas first; that might be fast enough.

dt = New Table( "Reprex",
    Add Rows( 1000000 ),
    New Column( "X", Numeric, Continuous, Formula( Random Integer( 0, 200 ) ) ),
    New Column( "Y", Numeric, Continuous, Formula( Random Integer( 0, 200 ) ) )
);

R_debut_X = 10;
largeur   = 50;
ecart_X   = 5;
debut_Y   = 20;
hauteur   = 30;
ecart_Y   = 5;
start = hptime();
// Slow way: row formula
dt << New Column( "Zone_slow", Character,
    Formula(
        If( R_debut_X < :X <= R_debut_X + largeur & debut_Y < :Y <= debut_Y + hauteur, "Zone 1",
            R_debut_X + largeur + ecart_X < :X <= R_debut_X + 2*largeur + ecart_X & debut_Y < :Y <= debut_Y + hauteur, "Zone 2",
            "Other"
        )
    )
);
dt<<runformulas; // <<<<<<<<<<<< important <<<<<<<<
stop=hptime();
show((stop-start)/1e6);// 1.3 sec

start = hptime();
dmatx = dt[0,{x}];
dmaty = dt[0,{y}];
cmatx1 = R_debut_X < dmatx <= R_debut_X + largeur;
cmaty1 = debut_Y < dmaty <= debut_Y + hauteur;
cmatxy1 = cmatx1 & cmaty1;
cmatx2 = R_debut_X + largeur + ecart_X < dmatx <= R_debut_X + 2*largeur + ecart_X;
cmaty2 = debut_Y < dmaty <= debut_Y + hauteur;
cmatxy2 = cmatx2 & cmaty2;
result = cmatxy1 + cmatxy2*2;
stop=hptime();
dt<<newcolumn("Zone_fast", values(result));
show((stop-start)/1e6);// .25 sec

Craige_Hales_0-1758807317326.png

 

The result value is using the 0 or 1 truth value in cmatxy1,2 to build the 0, 1, or 2 value for other, zone1, zone2. You could probably collapse some of the matrix statements to make it still faster if that is important.

 

Craige

Recommended Articles