Re: Make JSL script more efficient for sampling a 'sliding window'

Report Inappropriate Content · Jul 30, 2024 07:07 AM

Hello,

I'm trying to create a script that will sample a data table with a 'sliding window' (sampling a set number of rows of a set 'width', then moving up a row to sample the next set, resulting in a data set of rows 1-10, 2-11, 3-12 and so forth) sampling of my data table, creating a new data set that I can use for pattern analysis.

My script is operational, but doesn't work well for larger databases, with slow (>1h) analysis or even crashing JMP, are there any suggestions for how to improve it?

Thanks!

// Define the window size (number of rows) and the increment (how much the standardised time should increase by)
windowSize = 9;
increment = 91;

// Get the number of rows in the data table
dt = Open("$SAMPLE_DATA/Time Series/GNP.jmp");
dt << Add Rows(windowSize);
numRows = N Rows(dt);

// Create a list to store the subsets
subsets = {};
standTimeMatrix = J(windowSize, 1, 0);  // Pre-allocate matrix for Stand Time

// Fill the Stand Time matrix
dt << begin data update;
For(j = 1, j <= windowSize, j++,
    standTimeMatrix[j, 1] = 1 + (j - 1) * increment;
);

// Loop through the data table to create the sliding windows
For(i = 1, i <= numRows - windowSize + 1, i++,
    // Get the subset of data for the current window
    subset = dt << Subset(Rows(i::i + windowSize - 1));

    // Add the window number to the subset
    windowNumberColumn = Repeat(i, windowSize);
    subset << New Column("Window Number", Numeric, Continuous, Set Values(windowNumberColumn));

    // Add the "Standardised Time" column to the subset
    subset << New Column("Standardised Time", Numeric, Continuous, Set Values(standTimeMatrix));

    // Add the subset to the list
    Insert Into(subsets, subset);
);


// Create a new table for concatenation
newTable = New Table("All Data");


// Run the concatenations using the subsets list

For(i = 1, i <= N Items(subsets), i++,
    newTable << Concatenate(
        subsets[i],
        append to first table(1)
    );
    // Close the subset table after concatenation to free up memory
    Close(subsets[i], No Save);
);

jthi · Jul 30, 2024 07:32 AM

Not sure if this is doing what you want (it has different result than your script) or what is "large database" but maybe something like this would work

Names Default To Here(1);

size = 10;

dt = Open("$SAMPLE_DATA/Time Series/GNP.jmp", Invisible);

dt_results = dt << Clone;
dt_results << Show Window(0);

dt_results << Delete Rows(1::N Rows(dt_results));
dt_results << New Column("Idx", Numeric, Ordinal);
dt_results << New Column("Rows", Numeric, Ordinal);

cols = dt << Get Column Names("String");

i = 0;
While(i + size <= N Rows(dt),
	old_rows = (1+i)::(i+size);
	dt_temp = dt << Subset(Rows(old_rows), Selected Columns(0), Invisible);
	dt_temp << New Column("Idx", Numeric, Ordinal, Set Each Value(i + 1));
	dt_temp << New Column("Rows", Numeric, Ordinal, Values(old_rows));
	
	dt_results << Concatenate(dt_temp,
		"Append to first table"
	);
	Close(dt_temp, no save);
	
	i++;
);

dt_results << Show Window(1);

-Jarmo

hogi · Jul 30, 2024 07:34 AM

Maybe it's not necessary to create the auxiliary table?

Col Moving Average provides some settings to compare values within a sliding window:

what are your next steps?