Background
* Using Python in JMP19 scripting environment.
* Creating dataframe (df) from pandas from existing JMP datatable (dt).
* Running function that requires df input and returns a new df result. Lets call it df_results.
* Transferring results back to JMP datatable column.
This works, but there are two things that make me suspect I'm not doing this properly. For one, I get a warning in the local log output. And two, the first time I run this on a very large number of rows it takes a long time. My code timer said 25 seconds for 100k rows. If I rerun it, the second time it takes 3 seconds. I set up the timer to give me details on each step so I could find out if one line of code was taking most of the time.
This is the single line of code I'm questioning. Took 25 seconds for 100k rows ONLY the first time the initially constant (or null - tried both) column was filled out. Then almost 10x faster every subsequent rerun after that.
dt[ColumnName] = df_results[ColumnName]
This is the warning message in the embedded log
FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
Also, I did this for 4 different columns total from 2 different dataframes. They all exhibited the same behavior (and warning). Slow first run. Fast every repeat. Negligible difference between Column 1 , 2, 3 or 4.
This is only a subset of a large table with 1.3 million rows. So, 25 seconds x 4 columns will become 250+ seconds x 4 columns if I can't figure out how to get the faster speeds on the first try.
Thanks