Multithreading in JSL

rocker · ‎01-09-2018

As computing has advanced, I've gone from 1 core to 2 cores to 4 cores... to 64 threads on my workstation. Meanwhile .jsl has effectively remained single threaded. I know some platforms will utilize all threads, but I'd like to be able to farm out simple jsl operations at will to multiple threads. For example, if I have a script which operates row-wise on 10,000,000 rows of data, if I'm iterating by row in a for loop and I know each row's result is independent of the other rows, I'd like to split that operation into 64 asynchronous for loops, pause other execution while the 64 loops complete in parallel, then move back to a single threaded, synchronous script after all are complete, much like I would in .py with Pool() and poolObject.join.

XanGregg · ‎04-02-2018

As far as farming out simple JSL operations to multiple threads, there is a function called Parallel Assign() for that.It breaks the work into chunks where the code for each chunk results in a single number, and the result of the entire operation is a matrix of numbers.

There is a toy example in the scripting index for Parallel Assign(), and the second example for Frame Size() shows a more elaborate example for computing Mandelbrot set data. Another small example is in @Craige_Hales's JSL Cookbook post on loops and there's more about Mandelbot in his blog.

I've included another example below which finds the largest file in a list of files using multi-threading. The script runs a single-threads version first, and you can see both spurts of activity in this task manager screenshot. The single-thread version is the long mesa and the parallel version is the tall peak. For 17,000 files on 8 cores, the time went from 13s to 4.5s, or 3x faster. Not ideal, but as is often the case, the threads are sharing some resources, such as the disk access in this case.

The JSL that executes in parallel does need to be simple so that it doesn't modify the shared global state. That's why some globals are copied in as the first step. It also means that the parallel code can't do things like create data tables or interact with the display.

We're interested in knowing what kind of other simple actions where we might need a richer interface. One that I know of is for each worker to return items other that numbers. Essentially, that would mean using a list for the results instead of a matrix.

----

Names Default To Here( 1 );
// Example for Parallel Assign().

// get a list of a lot of files
dir = If(Host is("Mac"), "$SAMPLE_DATA", "$JMP_HOME");
all files = Files In Directory( dir, "recursive" );
naf = N Items( all files );

// convert relative paths to full paths
For( i = 1, i <= naf, i++,
	all files[i] = dir || "/" || all files[i]
);

// Function that looks at a range of a given file list and returns
// the index of the largest file.
// Importantly, this function uses no global variables.
largest file = Function( {files, first, last},
	{i, i large = -1, bytes large = -1, bytes}, 
	For( i = first, i <= last, i++,
		bytes = File Size( files[i] );
		If( bytes > bytes large,
			bytes large = bytes;
			i large = i;
		);
	);
	i large;
);

// Single-threaded approach: see all the files to the function.
t0 = Tick Seconds();
lf = largest file( all files, 1, naf );
t1 = Tick Seconds();
Show( naf, lf, all files[lf], t1 - t0 );
// about 13sec on my machine (17k files on Win)

wait(1);	// so we can see a break in a CPU activity monitor

// Multi-threaded approach: break the list up into 8 ranges that
// can be handled by separate threads.
t0 = Tick Seconds();
num chunks = 8;	// JMP may create this many threads, or fewer

// initialize to -1, so we can detect failure if desired
results = J( num chunks, 1, -1 );
Parallel Assign(
	// copy these globals into the thread's local variables
	{tf = Name Expr( largest file ), tfiles = all files, tn = num chunks},
	// make this assignment for each chunk:
	results[i] = Local( {n = N Items( tfiles )},
		tf( tfiles, 1 + Floor( (i - 1) * n / tn ), Floor( i * n / tn ) )
	)
);
// We need to call the function once more to see which of the top
// files is the overall largest.
top files = all files[results];
toplf = largest file( top files, 1, num chunks );
lf = results[toplf];
t1 = Tick Seconds();
Show( naf, lf, all files[lf], t1 - t0 );
// about 4.5sec

Jeff_Perkinson · ‎02-29-2020

While this wish has many votes, unfortunately anything more than what is offered by Parallel Assign() will be difficult to deliver. Because of the tight integration with the user interface, and the interactive nature of JMP it is not possible to provide a thread-safe environment without limiting the possible JSL down to essentially what Parallel Assign allows.

The best way to take advantage of multithreading is to use the built-in platforms as much as possible. These platforms are where we can provide multithreaded solutions directly. Opening other wishes for built-in operations where you think that multithreading would be useful is the best way to let us know where you'd like to see more multi-threading.

Jeff_Perkinson · ‎01-25-2021

As noted directly above, this is wish is beyond the scope of what we're going to do in JSL. If there are specific operations that you believe would benefit from multithreading please open other wishes with the details.

shampton82 · ‎09-19-2022

Hey Jeff,

Where can I find what platforms/built in tasks take advantage of multi-threading?

Jeff_Perkinson · ‎09-20-2022

Hi Steve,

There are lots of places in JMP where we try to take advantage of multiple cores: everything from saving/opening a compressed data table (each column is sent to a different thread), to the Distribution platform with multiple columns (each distribution analysis is on a different thread), to the Mersenne-Twister random uniform generator. As you can see multi-threading is done in lots of different places.

So, we don't really have a list like that. What drives the question here?