cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Choose Language Hide Translation Bar
maryam_nourmand
Level III

data preprocessing

Hello.
My question is when we use nonparametric model in software such as neural network or SVM or Naive bayes or ....do the software scale our data by default?or before we use these models we should scale our numeric data?

1 ACCEPTED SOLUTION

Accepted Solutions
Victor_G
Super User

Re: data preprocessing

Ok, but in my (very long) answer, I did mention about tree-based methods and Naive Bayes:

Tree-based models and probability-based algorithms like Naive Bayes may not require scaling.

Tree-based methods don't require scaling as they are are not distance-based algorithms, the splits are done based on the order of the data and information generated by splitting at a certain threshold, but there are no influences of the individual values, ranges or distributions on the split results.
Naive Bayes is a probability-based algorithm, it calculates probabilities from the data's distribution and is invariant to the scale of the data.

 

Some further ressources :

https://stats.stackexchange.com/questions/244507/what-algorithms-need-feature-scaling-beside-from-sv...

https://www.dataschool.io/comparing-supervised-learning-algorithms/

https://forecastegy.com/posts/do-decision-trees-need-feature-scaling-or-normalization/

 

Does this complementary response answer your question ?

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics

View solution in original post

4 REPLIES 4
Victor_G
Super User

Re: data preprocessing

Hi @maryam_nourmand,

Did you read my reply to one of your similar question : https://community.jmp.com/t5/Discussions/data-preprocessing/m-p/761840/highlight/true#M93976 ?
Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
maryam_nourmand
Level III

Re: data preprocessing

Yes i read
but you mention about (( SVM, KNN, Neural Networks, (linear & logistic) regression))
i actually want to know about other models 
boosted tree
bootstrap forest
decision tree
Naive bayes

thanks

Victor_G
Super User

Re: data preprocessing

Ok, but in my (very long) answer, I did mention about tree-based methods and Naive Bayes:

Tree-based models and probability-based algorithms like Naive Bayes may not require scaling.

Tree-based methods don't require scaling as they are are not distance-based algorithms, the splits are done based on the order of the data and information generated by splitting at a certain threshold, but there are no influences of the individual values, ranges or distributions on the split results.
Naive Bayes is a probability-based algorithm, it calculates probabilities from the data's distribution and is invariant to the scale of the data.

 

Some further ressources :

https://stats.stackexchange.com/questions/244507/what-algorithms-need-feature-scaling-beside-from-sv...

https://www.dataschool.io/comparing-supervised-learning-algorithms/

https://forecastegy.com/posts/do-decision-trees-need-feature-scaling-or-normalization/

 

Does this complementary response answer your question ?

Victor GUILLER
Scientific Expertise Engineer
L'Oréal - Data & Analytics
maryam_nourmand
Level III

Re: data preprocessing

Thanks a lot