Again, amazed at Jim's ability to get to the heart of the question. Two nice bits of code here deserve a longer explanation:
What does this do?
name_key = associative array(column(ref_file_1, "NAME") << get values) << get keys;
ref_file_1 is a data table reference, "NAME" is a column's name as a string (literally, "NAME", but could be in a variable), and the column function returns a column reference. The <<get values message gets all the values from the column, including duplicates. The associative array function creates a set of the unique values, and the <<get keys message returns the unique values. (Nicely done @29755.)
and this?
merged_file << select where(nrows(loc(name_key,:Name_final))>0);
The <<select where message operates on each row of the merged_file, one at a time and selects rows when the argument is non-zero.
The loc() function returns a matrix of indexes. It searches the list in name_key for occurrences of :Name_final. The resulting matrix can have zero or more rows, though in this example it will have 0 or 1 row because the keys are unique. The nrows() function returns 0 or 1 which is then compared >0 and remains 0 or 1. So rows are selected if :Name_final is in the list of ref_file_1:NAME.
Alternative:
What if there were >1e6 names in name_key and >1e7 rows in merge_file? What would the performance look like? The loc function will be searching, a lot. It will compare 1e13 strings. You'll get tired of waiting. The fix for that is to use the associative array like this:
nameSet = associative array(column(ref_file_1, "NAME") << get values);
merged_file << select where(contains(nameSet, :Name_final));
Now there are only 1e7 lookups in nameSet. They are not free, but they are much faster than checking 1e6 items sequentially. I'll guess about 20 compares each (log2(1e6)). This may also be easier for the next maintainer to follow.
(Untested. Corrections welcome!)
Craige