Choose Language Hide Translation Bar
Level II

Selecting multi rows based on other table value

Hello, Im trying to select valuse from one table based on other table rows.

My current code use loop but its really slow on big tables and I want to find a way to make it more efficient.

My goal is to remove all the outliers from my data so I first calculate the quantiles and then loop them over the main table to find the matching rows. At the end I delete this rows.


Is there a way to use my conditional "Select Where" logic without looping? Maybe something like but with logic. thanks!

dt << Select Where( Contains( myList, :age ) );


disturbation_table = dt << Summary( Group( :NAME, :ENTITY ), Quantiles( 1, :VALUE ), Quantiles( 99, :VALUE ), Link to original data table( 0 ));

n_rows = NRow(disturbation_table);

//iterate all the disturbation data
For(i = 1, i <= n_rows, i++,

	name_value = Column(disturbation_table, "NAME")[i];
	entity_value = Column(disturbation_table, "ENTITY")[i];
	QuantilesUp = Column(disturbation_table, "Quantiles99(VALUE)")[i];
	QuantilesDown = Column(disturbation_table, "Quantiles1(VALUE)")[i];

		outliers_rows_to_delete = dt << Select Where(dt:NAME == name_value & dt:ENTITY == entity_value & (dt:VALUE >= QuantilesUp | dt:VALUE <= QuantilesDown), current selection("extend"));

dt << Delete Rows(outliers_rows_to_delete);	

Re: Selecting multi rows based on other table value

I would expect this approach might be quicker:


// Make a table to mess with
n = 1000;
numName = 20;
numEntity = 10;
mat = J(n, 1, RandomInteger(numName)) || J(n, 1, RandomInteger(numEntity)) || J(n, 1, RandomNormal());
dt = asTable(mat, << columnNames({"Name", "Entity", "Value"}));
dt << setName("Data");

// Find the quantiles for the 'by' groups
upperCol = dt << NewColumn("99% Quantile", Numeric, Continuous, Formula(colQuantile(:Value, 0.99, :Name, :Entity)));
lowerCol = dt << NewColumn("1% Quantile", Numeric, Continuous, Formula(colQuantile(:Value, 0.01, :Name, :Entity)));
dt << runFormulas;
upperCol << deleteFormula;
lowerCol << deleteFormula;

// Select outliers rows
outlierRows = dt << getRowsWhere(:Value >= :Name("99% Quantile") | :Value <= :Name("1% Quantile") );
dt << selectRows(outlierRows);

// Remove them?
// Column(dt, "Value")[outlierRows] = .;

Of course there is more than one definition of 'outllier', and more than one way to handle them . . .

Level II

Re: Selecting multi rows based on other table value

Its only calculate the first rows quarantines.
I want to get the data group by name and entity.

Re: Selecting multi rows based on other table value

The command:

colQuantile(:Value, 0.99, :Name, :Entity)

uses :Name and :Entity as 'by' variables, so should get you what you want. Look for this command in 'Help > Scripting Index'. Or perhaps I've not understood.

Article Labels

    There are no labels assigned to this post.