Subscribe Bookmark RSS Feed

Formula to find Mode of Daily values

evasonl

Community Trekker

Joined:

Jul 5, 2016

Hi,

I have a datatable which has multiple samples tracking different parameters for a collection of different systems within those samples (1 sample will have 1 system, but another sample may also have the same system). Each sample has a different state as it approaches it's end. These states are what I'm trying to predict. I have created a decision tree algorithm using some different samples in a seperate datatable which I then have added to this datatable. What this decision tree is trying to do is predict the state of the system at different points in time. I will then measure these predicted states against the known states of the system at these points in time (the algorithm was created with different samples so this is "fresh" data)
Now as you can imagine during the course of a day this formula can be sending hundreds or thousands of different predictions and not all of them are correct so I want a column formula that can look at all of these predictions, by sample number and by day, and output what the most commonly predicted state of the equipment (Mode)

 

In other words I'd like a column that calculates the mode Alert rating for each day of each sample of the data and output it's predicted Alert rating to each of these rows

 

In the example datatable I've attached I would like a formula that can automate the column titled "Alert Prediction for the day" and I don't mind it using the LogTime column or the time since start up column to create it's daily predictions. 

Thanks for your help

1 ACCEPTED SOLUTION

Accepted Solutions
ian_jmp

Staff

Joined:

Jun 23, 2011

Solution

Bearing in mind that's it's rather specific to your case, this should do it. You can use 'Help > Scripting Index' and the editor tooltips to help understand how it works. 

NamesDefaultToHere(1);

// Starting table
dt = DataTable("Example Datatable for Summarising Alert Ratings.jmp");

// Add required columns
dt << NewColumn("Date", Character, Formula(Short Date( :Logtime )));
dt << NewColumn("Group", Character, Formula((:System || " ") || :Date));

// Find the mode of :Decision Tree Prediction for each level in :Group, and fill in 'modeVals'
vals = Column(dt, "Decision Tree Prediction") << getValues;
gVals = Column(dt, "Group") << getValues;
levs = AssociativeArray(gVals) << getKeys;
modes = J(NItems(levs), 1, .);
modeVals = J(NRow(dt), 1, .);
for(i=1, i<=NItems(levs), i++,
	rows = Loc(AsList(gVals), levels[i]);
	modes[i] = Mode(vals[rows]);
	modeVals[rows] = modes[i];
	);

// Build the final column in dt using 'modeVals'
vl = Column(dt, "Decision Tree Prediction") << getProperty("Value Labels");
nc = dt << NewColumn("Modes for Decision Tree Prediction", Numeric, Nominal, Values(modeVals));
nc << valueLabels(Arg(vl, 1));

 

10 REPLIES
txnelson

Super User

Joined:

Jun 22, 2012

Does the solution have to be a formula or would a script that does the calculations and joins the results back to the original table be acceptable? The reason I ask, is that I believe using a formula would be many times less efficient because of all of the replicate work each row would be doing.
Jim
evasonl

Community Trekker

Joined:

Jul 5, 2016

Originally I was going for a formula, but I definitely see your point and think overall the more efficient solution would definitely be the best. So yes a script would be really useful thanks

ian_jmp

Staff

Joined:

Jun 23, 2011

If both columns are numeric, you could do it this way:

NamesDefaultToHere(1);
ClearLog();

// Given a set of values ('vals') and group membership ('gVals'),
// returns the distinct levels in 'gVals' and the mode of each group
modeWithBy =
Function({vals, gVals}, {Default Local},
	levs = AssociativeArray(gVals) << getKeys;
	modes = J(NItems(levs), 1, .);
	for(i=1, i<=NItems(levs), i++, modes[i] = Mode(vals[Loc(AsList(gVals), levs[i])]));
	EvalList({levs, modes});
);

l = [1, 2, 1, 1, 2, 2, 3];
g = [1, 1, 1, 1, 2, 2, 2];
Print(modeWithBy(l, g));
evasonl

Community Trekker

Joined:

Jul 5, 2016

The group "Sample" can be made to be continuous, but it is definitely a nominal data set. The group System is not able to be made continuous without using value labels and also is a nominal data set. How would this then related to organise it to find the mode over all of the groups (System and Sample) followed by organising the data into a day by day basis for each combination of them

Also I'd like these values to be merged back into the datatable so you can see what the predicted alert is on a day by day basis

Thanks for your help

evasonl

Community Trekker

Joined:

Jul 5, 2016


ian_jmp wrote:

If both columns are numeric, you could do it this way:

NamesDefaultToHere(1);
ClearLog();

// Given a set of values ('vals') and group membership ('gVals'),
// returns the distinct levels in 'gVals' and the mode of each group
modeWithBy =
Function({vals, gVals}, {Default Local},
	levs = AssociativeArray(gVals) << getKeys;
	modes = J(NItems(levs), 1, .);
	for(i=1, i<=NItems(levs), i++, modes[i] = Mode(vals[Loc(AsList(gVals), levs[i])]));
	EvalList({levs, modes});
);

l = [1, 2, 1, 1, 2, 2, 3];
g = [1, 1, 1, 1, 2, 2, 2];
Print(modeWithBy(l, g));


The group "Sample" can be made to be continuous, but it is definitely a nominal data set. The group System is not able to be made continuous without using value labels and also is a nominal data set. How would this then related to organise it to find the mode over all of the groups (System and Sample) followed by organising the data into a day by day basis for each combination of them

Also I'd like these values to be merged back into the datatable so you can see what the predicted alert is on a day by day basis

Thanks for your help

ian_jmp

Staff

Joined:

Jun 23, 2011

If I understand correctly, you want to define a group for each value of 'System' combined with the 'date part' of 'Logtime' (maybe use the 'ShortDate()' function to get this into a new column).

 

If that's the case, you could either generalise the code above to work with more than one 'by' variable, else (perhaps simpler) just concatenate two character columns together and use this to define the groups. Finally, as you point out, you would need to adapt the code to work when 'gVals' is character. Maybe:

NamesDefaultToHere(1);
ClearLog();

// Given a set of values ('vals') and group membership ('gVals'),
// returns the distinct levels in 'gVals' and the mode of each group
modeWithBy =
Function({vals, gVals}, {Default Local},
	levs = AssociativeArray(gVals) << getKeys;
	modes = {};
	for(i=1, i<=NItems(levs), i++, InsertInto(modes, Mode(vals[Loc(AsList(gVals), levs[i])])));
	EvalList({levs, modes});
);

l = [1, 2, 1, 1, 2, 2, 3];
g = {"1", "1", "1", "1", "2", "2", "2"};
Print(modeWithBy(l, g));

 

evasonl

Community Trekker

Joined:

Jul 5, 2016

So effectively the logic looks like this:

for each Sample -> Split them up into days (could use the datetime column or the column I have already created which has for each row the number of days since the start of this particular sample. Yes thanks, using shortcode might be a way to make the grouping of days easier)

 

Then for each sample look at the column which is predicting the Alert state and then for every day of that sample look at every prediction that is made and find the mode value for it (which is ordinal). Then take that output mode value and inset it back into the datatable on each row of a new column (hence why I originally wanted it as a formula) for that day of that sample. I think it's clearer when you read this looking at the datatable

Does that help make a bit more sense?

evasonl

Community Trekker

Joined:

Jul 5, 2016

shortdate* sorry, not shortcode

ian_jmp

Staff

Joined:

Jun 23, 2011

Solution

Bearing in mind that's it's rather specific to your case, this should do it. You can use 'Help > Scripting Index' and the editor tooltips to help understand how it works. 

NamesDefaultToHere(1);

// Starting table
dt = DataTable("Example Datatable for Summarising Alert Ratings.jmp");

// Add required columns
dt << NewColumn("Date", Character, Formula(Short Date( :Logtime )));
dt << NewColumn("Group", Character, Formula((:System || " ") || :Date));

// Find the mode of :Decision Tree Prediction for each level in :Group, and fill in 'modeVals'
vals = Column(dt, "Decision Tree Prediction") << getValues;
gVals = Column(dt, "Group") << getValues;
levs = AssociativeArray(gVals) << getKeys;
modes = J(NItems(levs), 1, .);
modeVals = J(NRow(dt), 1, .);
for(i=1, i<=NItems(levs), i++,
	rows = Loc(AsList(gVals), levels[i]);
	modes[i] = Mode(vals[rows]);
	modeVals[rows] = modes[i];
	);

// Build the final column in dt using 'modeVals'
vl = Column(dt, "Decision Tree Prediction") << getProperty("Value Labels");
nc = dt << NewColumn("Modes for Decision Tree Prediction", Numeric, Nominal, Values(modeVals));
nc << valueLabels(Arg(vl, 1));