Solved: Re: data preprocessing

Report Inappropriate Content · Jun 16, 2024 05:40 PM

Hello.
My question is when we use nonparametric model in software such as neural network or SVM or Naive bayes or ....do the software scale our data by default?or before we use these models we should scale our numeric data?

Victor_G · Jun 17, 2024 1:31 AM

Ok, but in my (very long) answer, I did mention about tree-based methods and Naive Bayes:

Tree-based models and probability-based algorithms like Naive Bayes may not require scaling.

Tree-based methods don't require scaling as they are are not distance-based algorithms, the splits are done based on the order of the data and information generated by splitting at a certain threshold, but there are no influences of the individual values, ranges or distributions on the split results.
Naive Bayes is a probability-based algorithm, it calculates probabilities from the data's distribution and is invariant to the scale of the data.

Some further ressources :

https://stats.stackexchange.com/questions/244507/what-algorithms-need-feature-scaling-beside-from-sv...

https://www.dataschool.io/comparing-supervised-learning-algorithms/

https://forecastegy.com/posts/do-decision-trees-need-feature-scaling-or-normalization/

Does this complementary response answer your question ?

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

View solution in original post

Victor_G · Jun 16, 2024 07:12 PM

Hi @maryam_nourmand,

Did you read my reply to one of your similar question : https://community.jmp.com/t5/Discussions/data-preprocessing/m-p/761840/highlight/true#M93976 ?

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

maryam_nourmand · Jun 17, 2024 03:23 AM

Yes i read
but you mention about (( SVM, KNN, Neural Networks, (linear & logistic) regression))
i actually want to know about other models
boosted tree
bootstrap forest
decision tree
Naive bayes

thanks

Victor_G · Jun 17, 2024 1:31 AM

Ok, but in my (very long) answer, I did mention about tree-based methods and Naive Bayes:

Tree-based models and probability-based algorithms like Naive Bayes may not require scaling.

Tree-based methods don't require scaling as they are are not distance-based algorithms, the splits are done based on the order of the data and information generated by splitting at a certain threshold, but there are no influences of the individual values, ranges or distributions on the split results.
Naive Bayes is a probability-based algorithm, it calculates probabilities from the data's distribution and is invariant to the scale of the data.

Some further ressources :

https://stats.stackexchange.com/questions/244507/what-algorithms-need-feature-scaling-beside-from-sv...

https://www.dataschool.io/comparing-supervised-learning-algorithms/

https://forecastegy.com/posts/do-decision-trees-need-feature-scaling-or-normalization/

Does this complementary response answer your question ?

Victor GUILLER

"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)

maryam_nourmand · Jun 17, 2024 04:31 AM

Thanks a lot

data preprocessing

Re: data preprocessing

Re: data preprocessing

Re: data preprocessing

Re: data preprocessing

Re: data preprocessing

Recommended Articles

New Features in Process Screening for JMP 17

Manage Limits for JMP 17

EWMA New Features in JMP 17

CUSUM New Features in JMP 17