Hi @Julianveda,
It looks your ordinal response might be a sensorial rating, very similar to a Likert scale : Échelle de Likert — Wikipédia.
It might be very difficult to help you concretely on your use case, as it might depend on various factors, some linked to measurement system capability and agreement between operators, sensitivity, ... and some linked to the characteristics of the data sample and the choice of the algorithm. Since @statman already mentioned the one linked to measurement system capability, I will make a focus on data sample and choice of the algorithm.
- About the characteristics of the data sample :
7 points are indeed the recommended number of levels for this type of scale, but its effectiveness depends on the data balance and representativeness between the different levels, as well as the data quantity (and quality ! signal/noise). In situation with strong imbalance or if some ratings are not (or rarely) used by operators, then it might be very difficult to analyze the data "as it is", so it might be helpful to pre-process/clean the data, for example by binning/grouping some ratings/classes together. Simplifying the number of classes can help logistic models or other algorithms "figure out" the rules to best separate the classes. But don't "over-simplify" your problem, or you risk to either have a "useless" model with only trivial conclusion/outcomes, or to have very noisy classes with a lot of different "realities"/levels in each of them.
Also take into consideration that using your ratings as an ordinal response imply to estimate more coefficients than with a continuous numerical response : you'll need to estimate n-1 intercepts (n being the number of classes) and the parameter estimates linked to your factors (main effects, interactions, ... depending on your assumed model). It might happen that with a DoE, you don't have enough data to estimate all these coefficients, so you might have to choose to simplify your classes (by grouping some) and/or using a continuous response, and/or using a different type of model/algorithm (next section).
The choice of using ordinal and/or continuous responses may also be linked to the number of classes for your response, as well as the uniform/even spacing between classes (from a numerical and sensorial points of view) : even if your classes have a difference of one between each neighboring classes (uniformly spaced from a numerical point of view), does the operator really "feel" this linearity and uniformity in the rating ? Or are there some "gaps" or differences in perception (like no or small perception of difference between intermediary classes -1, 0 and 1, or between the extreme classes -3 and -2 or 2 and 3) ? Does the evaluation involves a benchmark (noted 0) so that the evaluation of your experiments can be done in reference to another reference sample ?
- About the choice of the algorithm :
Depending on the linearity of the separation between classes/ratings (and the assumption of linearity between response and predictors), logistic regression may encounter some problems ; it might be interesting to try using other type of models/algorithms, that may help separate the classes in a non-linear way. Some Machine Learning algorithms are effective at separating classes in a non-linear way with a reduced risk of overfitting, so you could try using for example Bootstrap Forest and Support Vector Machines. These algorithms can also be used if you decide to consider your response as numerical continuous.
If your classes are linearly separable and your inputs continuous, you could also check the Discriminant Analysis and see how different methods agree or differ about the results.
In any case, I would recommend to first plot the data and visualize the trends before doing any analysis. Can you spot some trends/patterns ? Does the separation between classes seems easy/hard (visualize or analyze the response depending on each predictor in a univariate way to explore your data) ?
I hope this complementary answer make sense for you and will help you,
Victor GUILLER
"It is not unusual for a well-designed experiment to analyze itself" (Box, Hunter and Hunter)