BookmarkSubscribe
Choose Language Hide Translation Bar
tomerFire
Occasional Contributor

Selecting multi rows based on other table value

Hello, Im trying to select valuse from one table based on other table rows.

My current code use loop but its really slow on big tables and I want to find a way to make it more efficient.

My goal is to remove all the outliers from my data so I first calculate the quantiles and then loop them over the main table to find the matching rows. At the end I delete this rows.

 

Is there a way to use my conditional "Select Where" logic without looping? Maybe something like but with logic. thanks!

dt << Select Where( Contains( myList, :age ) );

 

disturbation_table = dt << Summary( Group( :NAME, :ENTITY ), Quantiles( 1, :VALUE ), Quantiles( 99, :VALUE ), Link to original data table( 0 ));

n_rows = NRow(disturbation_table);

//iterate all the disturbation data
For(i = 1, i <= n_rows, i++,

	name_value = Column(disturbation_table, "NAME")[i];
	entity_value = Column(disturbation_table, "ENTITY")[i];
	QuantilesUp = Column(disturbation_table, "Quantiles99(VALUE)")[i];
	QuantilesDown = Column(disturbation_table, "Quantiles1(VALUE)")[i];

		try(
		
		outliers_rows_to_delete = dt << Select Where(dt:NAME == name_value & dt:ENTITY == entity_value & (dt:VALUE >= QuantilesUp | dt:VALUE <= QuantilesDown), current selection("extend"));
		);	
);

dt << Delete Rows(outliers_rows_to_delete);	
0 Kudos
3 REPLIES 3
Highlighted
ian_jmp
Staff

Re: Selecting multi rows based on other table value

I would expect this approach might be quicker:

NamesDefaultToHere(1);

// Make a table to mess with
n = 1000;
numName = 20;
numEntity = 10;
mat = J(n, 1, RandomInteger(numName)) || J(n, 1, RandomInteger(numEntity)) || J(n, 1, RandomNormal());
dt = asTable(mat, << columnNames({"Name", "Entity", "Value"}));
dt << setName("Data");

// Find the quantiles for the 'by' groups
upperCol = dt << NewColumn("99% Quantile", Numeric, Continuous, Formula(colQuantile(:Value, 0.99, :Name, :Entity)));
lowerCol = dt << NewColumn("1% Quantile", Numeric, Continuous, Formula(colQuantile(:Value, 0.01, :Name, :Entity)));
dt << runFormulas;
upperCol << deleteFormula;
lowerCol << deleteFormula;

// Select outliers rows
Wait(3);
outlierRows = dt << getRowsWhere(:Value >= :Name("99% Quantile") | :Value <= :Name("1% Quantile") );
dt << selectRows(outlierRows);

// Remove them?
// Column(dt, "Value")[outlierRows] = .;

Of course there is more than one definition of 'outllier', and more than one way to handle them . . .

0 Kudos
tomerFire
Occasional Contributor

Re: Selecting multi rows based on other table value

Its only calculate the first rows quarantines.
I want to get the data group by name and entity.
0 Kudos
ian_jmp
Staff

Re: Selecting multi rows based on other table value

The command:

colQuantile(:Value, 0.99, :Name, :Entity)

uses :Name and :Entity as 'by' variables, so should get you what you want. Look for this command in 'Help > Scripting Index'. Or perhaps I've not understood.

0 Kudos