Still playing with twitter trends. I have enough data collected that I'd like to speed up processing. I want to assign an integer index to each trending word by using an associative array indexed with the trend word. Below there are three techniques for adding a new word to the table or getting the index for an old word.
The first one uses TRY to catch the exception when a word is not in the table. It is the fastest if the word is present and slowest (by far) if the exception happens.
The second uses CONTAINS to test if the word is present, then adds it or looks it up. It is the fastest for adding and slowest for looking up (because it looks up twice.)
The last uses an associative array default value to see if the retrieved value already exists. It provides a nice compromise speed.
All the examples have a "t" loop that runs twice. The first time the xN array is empty and must be filled in. The second time all the values will be already present.
explain = {"create new entry","use existing entry"};
N = 1e5;
x1 = [=> ];
For( t = 1, t <= 2, t += 1,
start = HP Time();
For( ix = 1, ix < N, ix += 1,
Try( y = x1[ix], y = (x1[ix] = ix) )
);
stop = HP Time();
Write( Eval Insert( "\!nTry method: ^char((stop - start)/1e6,5,3)^ seconds ^explain[t]^" ) );
);
x2 = [=> ];
For( t = 1, t <= 2, t += 1,
start = HP Time();
For( ix = 1, ix < N, ix += 1,
If( Contains( x2, ix ),
y = x2[ix],
y = (x2[ix] = ix)
)
);
stop = HP Time();
Write( Eval Insert( "\!nContains method: ^char((stop - start)/1e6,5,3)^ seconds ^explain[t]^" ) );
);
x3 = [=> 0];
For( t = 1, t <= 2, t += 1,
start = HP Time();
For( ix = 1, ix < N, ix += 1,
y = x3[ix];
If( !y, y = (x3[ix] = ix) );
);
stop = HP Time();
Write( Eval Insert( "\!ndefault 0 method: ^char((stop - start)/1e6,5,3)^ seconds ^explain[t]^" ) );
);
x1 << setdefaultvalue( 0 );
x2 << setdefaultvalue( 0 );
Show( x1 == x2, x1 == x3 );
Try method: 2.851 seconds create new entry
Try method: 0.099 seconds use existing entry
Contains method: 0.209 seconds create new entry
Contains method: 0.151 seconds use existing entry
default 0 method: 0.259 seconds create new entry
default 0 method: 0.117 seconds use existing entry
x1 == x2 = 1;
x1 == x3 = 1;
In red, TRY and CONTAINS provide the fastest times for existing and new entries. In green, DEFAULT 0 might be a good compromise. It really depends on how many entries will be created, and how many times they will be looked up.