Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
0 Kudos

Multithreading in the Fit Model Platform

For very large data sets, Fit Model with many 'By' variables has limited performance.  Assuming the model execution doesn't run out of memory, the processing time is quite slow.  For example: A 50 million row data file with X and Y coordinates as the model effects, and Z coordinate as the Y variable to fit with 1000 distinct 'By' variables will fail on a system with 256GB of RAM and multiple processors. (The memory is what kills it here but assuming we had infinite RAM, the compute time is still very slow since it is single threaded)


The wish is for the Fit Model Platform to bundle the model calculations into smaller, more manageable, batches of 'By' variables and utilize multithreading to process through the work load until all 'By' variables have a response generated. In the particular case I am working with, the Personality is 'Standard Least Squares' and the Emphasis is 'Minimal Report'. Once the response is generated, I am capturing the residual values and predicted values which also takes time to update to the data table.  



Is there opportunity to manage Fit Model work loads more efficiently for large data sets? 


Thank you,