good morning,
someone can help me? ...
i would like to replace empty values of numeric columns with the average column value.
i tried this
dt = Current Data Table();
FILL = dt << get column names( Numeric );
For( i = 1, i <= N Items( FILL ), i++,
FILL[i][dt << get rows where( Is Missing( FILL[i][] ) )] = col mean(FILL[i]);
);
thanks in advance
Gianpaolo
Please find one way to do this:
NamesDefaultToHere(1);
// Make some test data
dt = NewTable("Test", NewColumn("Data", Numeric, Continuous, Formula(if(Mod(Row(), 2) == 0, RandomNormal()))));
dt << addRows(20);
dt << runFormulas;
Column(dt, "Data") << deleteFormula;
// Impute missing cells with the mean of the others
Wait(3);
col = Column(dt, "Data");
values = col << getValues;
values[Loc(IsMissing(values))] = Mean(values);
col << setValues(values);
Probably it's good practice to delineate the values that were imputed by colouring their cells.
I didn't study your code in detail, but had the sense that you are doing more work than you need to. To make things more readable, you could consider using functions as below:
NamesDefaultToHere(1);
// Only impute if more than n cells are missing
imputeContinuousCol =
Function({col, n}, {Default Local},
values = col << getValues;
missingValuePos = Loc(IsMissing(values));
if(NRow(missingValuePos) > n,
values[Loc(IsMissing(values))] = Mean(values);
col << setValues(values);
);
);
// Only impute if more than n cells are missing
imputeCharacterCol =
Function({col, n}, {Default Local},
values = col << getValues;
missingValuePos = Loc(values, "");
if(NRow(missingValuePos) > n,
values[Loc(values, "")] = "NA";
col << setValues(values);
);
);
// Data table . . .
dt = DataTable("Big Class.jmp");
// List of column names that satidfy your imputation criteria . . .
imputeList = {"name", "sex", "height"};
// Loop over this list, and impute if necessary
for (c=1, c<=NItems(imputeList), c++,
col = Column(dt, imputeList[c]);
if(
(col << getModelingType) == "Continuous",
imputeContinuousCol(col, 1),
(col << getDataType) == "Character",
imputeCharacterCol(col, 1),
);
);
Open 'Big Class', make some cells missing in the listed columns, then try it out.
Please find one way to do this:
NamesDefaultToHere(1);
// Make some test data
dt = NewTable("Test", NewColumn("Data", Numeric, Continuous, Formula(if(Mod(Row(), 2) == 0, RandomNormal()))));
dt << addRows(20);
dt << runFormulas;
Column(dt, "Data") << deleteFormula;
// Impute missing cells with the mean of the others
Wait(3);
col = Column(dt, "Data");
values = col << getValues;
values[Loc(IsMissing(values))] = Mean(values);
col << setValues(values);
Probably it's good practice to delineate the values that were imputed by colouring their cells.
I didn't study your code in detail, but had the sense that you are doing more work than you need to. To make things more readable, you could consider using functions as below:
NamesDefaultToHere(1);
// Only impute if more than n cells are missing
imputeContinuousCol =
Function({col, n}, {Default Local},
values = col << getValues;
missingValuePos = Loc(IsMissing(values));
if(NRow(missingValuePos) > n,
values[Loc(IsMissing(values))] = Mean(values);
col << setValues(values);
);
);
// Only impute if more than n cells are missing
imputeCharacterCol =
Function({col, n}, {Default Local},
values = col << getValues;
missingValuePos = Loc(values, "");
if(NRow(missingValuePos) > n,
values[Loc(values, "")] = "NA";
col << setValues(values);
);
);
// Data table . . .
dt = DataTable("Big Class.jmp");
// List of column names that satidfy your imputation criteria . . .
imputeList = {"name", "sex", "height"};
// Loop over this list, and impute if necessary
for (c=1, c<=NItems(imputeList), c++,
col = Column(dt, imputeList[c]);
if(
(col << getModelingType) == "Continuous",
imputeContinuousCol(col, 1),
(col << getDataType) == "Character",
imputeCharacterCol(col, 1),
);
);
Open 'Big Class', make some cells missing in the listed columns, then try it out.