topic Re: data preprocessing in Discussions

data preprocessing

maryam_nourmand — Sun, 16 Jun 2024 21:40:22 GMT

Hello.
My question is when we use nonparametric model in software such as neural network or SVM or Naive bayes or ....do the software scale our data by default?or before we use these models we should scale our numeric data?

Re: data preprocessing

Victor_G — Sun, 16 Jun 2024 23:12:10 GMT

Hi @maryam_nourmand,

Did you read my reply to one of your similar question : https://community.jmp.com/t5/Discussions/data-preprocessing/m-p/761840/highlight/true#M93976 ?

Re: data preprocessing

maryam_nourmand — Mon, 17 Jun 2024 07:23:44 GMT

Yes i read
but you mention about (( SVM, KNN, Neural Networks, (linear & logistic) regression))
i actually want to know about other models
boosted tree
bootstrap forest
decision tree
Naive bayes

thanks

Re: data preprocessing

Victor_G — Mon, 17 Jun 2024 08:31:10 GMT

Ok, but in my (very long) answer, I did mention about tree-based methods and Naive Bayes:

Tree-based models and probability-based algorithms like Naive Bayes may not require scaling.

Tree-based methods don't require scaling as they are are not distance-based algorithms, the splits are done based on the order of the data and information generated by splitting at a certain threshold, but there are no influences of the individual values, ranges or distributions on the split results.
Naive Bayes is a probability-based algorithm, it calculates probabilities from the data's distribution and is invariant to the scale of the data.

Some further ressources :

https://stats.stackexchange.com/questions/244507/what-algorithms-need-feature-scaling-beside-from-svm

https://www.dataschool.io/comparing-supervised-learning-algorithms/

https://forecastegy.com/posts/do-decision-trees-need-feature-scaling-or-normalization/

Does this complementary response answer your question ?

Re: data preprocessing

maryam_nourmand — Mon, 17 Jun 2024 08:31:29 GMT

Thanks a lot