I have a dataset of 10000 rows x 100+ columns used for neural network modeling. I have made 75 different models of 20 nodes x 100 inputs/node and saved the formulas using fast formulas. I have 10 different tables (same dataset, different modeling), each with 75 models and I want to merge all of the models (formulas) into one master table, since I want to apply all of the models to any new data I collect.
Every method I try takes forever. I have used the Copy Formulas Add-in to copy each table's formulas to a single table and it takes about 2 days total to complete all of the table merges and with no indication of progress, it's hard to know if it is still working or has frozen.
I have used Join by matching rows with suppress formula evaluation to get them all into one table and this is fast. However, when I turn formula evaluation back on by manually selecting the formula columns and using Columns>Standardize Attributes>Column Properties>Formula>Suppress Eval, the recalculation of the 750 formulas takes forever. It has been running for 30 hours (maxed out at 12% processor utilization) and I have no idea what the progress is.
I have tried looping through the columns one at a time to turn formula evaluation on but even doing this for 1 column took over 5 minutes, which seems incredibly long.
Is there any way to speed this up? I know there are workarounds but this seems like it should be much easier and faster than this.
With 'larger' problems it's hard to offer general advice about efficiency without knowing some or all of the details. If you haven't already, you could look at the impact of the 'Begin Data Update()' and 'End Data Update()' table messages, and/or controlling the way formulas are evaluated (or not). FWIW, my instinct tells me that you may do better by using the Formula Depot as an intermediary. I started to write some code to check this out but for problems of this size the modelling itself takes some time so I didn't continue down this path:
NamesDefaultToHere(1); // Define the data volume nr = 100; nc = 10; // Make some random data dt = AsTable(J(nr, nc, RandomNormal(0, 1))); dt << setName("Random Data"); // Fit neural and publish prediction formula cols = dt << getColumnNames("String"); yCol = cols[NItems(cols)]; RemoveFrom(cols, NItems(cols)); nn = dt << Neural(Y(Eval(yCol)), X(Eval(cols)), Fit( NTanH( 3 ) ), Invisible); nn(Fit(1)) << publishPredictionFormula; Report(nn) << closeWindow;
Thanks for the rapid response. I just tried the Formula Depot method by Add Formula From Column with all of the formulas and the script creation was fast. When I tested Run Scripts (running 6 scripts) on a duplicate table with the Formula Columns removed, it was very fast. So far so good. When I then tried running all of the scripts on the same table, it started fast but then progressively slowed down as more columns were created (I could see the columns being created in the Columns pane). It eventually appeared to freeze (no new columns created in 5 minutes). Hmmmm. I am now going to try doing it in small batches to see if that helps.
I wonder if the formulas are all being recalculated simultaneously and are competing for processor time.
I tried doing Run Script from the Formula Depot on 100 formulas at a time. The first 100 went reasonably fast. Second 100 slowed to a crawl and appeared to stop.
I tried copying the scripts for all formulas from Formula Depot, wrapping in <<Begin Data Update ... <<End Data Update.
I tried running Suppress Eval (0) on columns of formulas that were copied with Suppress Eval (1). I looped through the columns wrapped in <<Begin Data Update ....Col(i) Suppress Eval(0)......Wait(0) <<End Data Update. I tried on 10 columns. Didn't work.
So far, nothing I have tried has worked faster than manually join the tables by row with the formula evaluation turned on and that took days.....
P.S. Is there a way to Break a running script?
There are no labels assigned to this post.