BookmarkSubscribeRSS Feed
Choose Language Hide Translation Bar
tomerFire

Occasional Contributor

Joined:

Oct 11, 2018

Selecting multi rows based on other table value

Hello, Im trying to select valuse from one table based on other table rows.

My current code use loop but its really slow on big tables and I want to find a way to make it more efficient.

My goal is to remove all the outliers from my data so I first calculate the quantiles and then loop them over the main table to find the matching rows. At the end I delete this rows.

 

Is there a way to use my conditional "Select Where" logic without looping? Maybe something like but with logic. thanks!

dt << Select Where( Contains( myList, :age ) );

 

disturbation_table = dt << Summary( Group( :NAME, :ENTITY ), Quantiles( 1, :VALUE ), Quantiles( 99, :VALUE ), Link to original data table( 0 ));

n_rows = NRow(disturbation_table);

//iterate all the disturbation data
For(i = 1, i <= n_rows, i++,

	name_value = Column(disturbation_table, "NAME")[i];
	entity_value = Column(disturbation_table, "ENTITY")[i];
	QuantilesUp = Column(disturbation_table, "Quantiles99(VALUE)")[i];
	QuantilesDown = Column(disturbation_table, "Quantiles1(VALUE)")[i];

		try(
		
		outliers_rows_to_delete = dt << Select Where(dt:NAME == name_value & dt:ENTITY == entity_value & (dt:VALUE >= QuantilesUp | dt:VALUE <= QuantilesDown), current selection("extend"));
		);	
);

dt << Delete Rows(outliers_rows_to_delete);	
3 REPLIES 3
ian_jmp

Staff

Joined:

Jun 23, 2011

Re: Selecting multi rows based on other table value

I would expect this approach might be quicker:

NamesDefaultToHere(1);

// Make a table to mess with
n = 1000;
numName = 20;
numEntity = 10;
mat = J(n, 1, RandomInteger(numName)) || J(n, 1, RandomInteger(numEntity)) || J(n, 1, RandomNormal());
dt = asTable(mat, << columnNames({"Name", "Entity", "Value"}));
dt << setName("Data");

// Find the quantiles for the 'by' groups
upperCol = dt << NewColumn("99% Quantile", Numeric, Continuous, Formula(colQuantile(:Value, 0.99, :Name, :Entity)));
lowerCol = dt << NewColumn("1% Quantile", Numeric, Continuous, Formula(colQuantile(:Value, 0.01, :Name, :Entity)));
dt << runFormulas;
upperCol << deleteFormula;
lowerCol << deleteFormula;

// Select outliers rows
Wait(3);
outlierRows = dt << getRowsWhere(:Value >= :Name("99% Quantile") | :Value <= :Name("1% Quantile") );
dt << selectRows(outlierRows);

// Remove them?
// Column(dt, "Value")[outlierRows] = .;

Of course there is more than one definition of 'outllier', and more than one way to handle them . . .

tomerFire

Occasional Contributor

Joined:

Oct 11, 2018

Re: Selecting multi rows based on other table value

Its only calculate the first rows quarantines.
I want to get the data group by name and entity.
Highlighted
ian_jmp

Staff

Joined:

Jun 23, 2011

Re: Selecting multi rows based on other table value

The command:

colQuantile(:Value, 0.99, :Name, :Entity)

uses :Name and :Entity as 'by' variables, so should get you what you want. Look for this command in 'Help > Scripting Index'. Or perhaps I've not understood.