Solved: Relation between categorical nominal independent variable and continuous depende...

pablosfontoura · Apr 25, 2020 05:51 AM

Hi,

I have a doubt on which statistical method to use to understand relation between:

- one categorical nominal variable (named "expertise level", with two bus-response options: "expert" and "non-expert")

and

- one dependent variable (named "total duration of fixations")

I would like to check if a characteristic of a group of people, in this case expertise level, and more specifically, if being an expert or a non-expert, influence the total duration of time this people spent looking at things (total fixation duration).

Which statistic method should I use for that? And how to interpret the results (which key parameter to check and how to interpret it)?

Thank you on your attention.

statman · Apr 28, 2020 10:39 AM

To be honest, I really don't understand the "study". Why does it matter how long someone looks at a painting? What are they looking for? How do you know when they found it? In other words, I don't see any efficacy metrics. What do they "have" after they looked at the painting for x amount of time? Are you measuring some sort of brain activity? Do they "rate" the painting after that time?

IMHO your grouping into 2 categories (experts and not) is overly simplified.

Other thoughts:

I think you should expand the study to include more variables (distance from painting, Intensity of light, type of lighting, angle of view, additional artists (perhaps from different genres)).
You don't have any way to estimate or separate measurement error (both within and between observer). For example, you don't have the same person observe the same painting twice.

The first step to dealing with outliers is to identify them. There are many ways to do this (some methods already shown above in the thread). Once identified you have to think about why they are "different". I always suggest predicting outliers in your study before you perform the study. What might cause unusual data points? Do this before the study and then observe if any of your predicted "explanations" happen during the study. In many cases, these are the most informative pieces of information. Once you understand why they happened, then you can play the game of removing them from the data set...The objective is NOT to have a great data set (or great graphs, plots or statistically significant effects). The objective is to learn.

"All models are wrong, some are useful" G.E.P. Box

View solution in original post

Mark_Bailey · Apr 25, 2020 09:06 AM

Your data table will have two data columns: Total Duration of Fixation (numeric data, continuous modeling type) and Expertise Level (character data, nominal modeling type). Select Analyze > Fit Y by X. Select Total Duration of Fixation and click Y. Select Expertise Level and click X. Click OK.

Now the choice of the analysis method depends on the nature of the data. The red triangle menu provides commands to perform all the available methods but the appropriate method depends on the assumptions that each method makes about the data. There are parametric methods such as the Student t-Test (equal and non-equal variance cases), analysis of variance (ANOVA, equal variance case), and analysis of means (ANOM). There are many non-parametric tests.

statman · Apr 25, 2020 11:47 AM

As Mark states, Fit Y by X. Some other thoughts:

1. Make sure you understand practical significance first. How much of a change in "Duration of Fixation" matters to you. How much of a change is of scientific value?

2. For future work, could you create more categories of Expertise (at least some ordinal scale). The closer to continuous, the more efficient.

3. Plot the data along with any statistical "tests". Does it make sense?

4. How is expertise determined? Measurement error?

5. Do you care about the variation in the Duration?

You can always attach the JMP data set for us to look at.

"All models are wrong, some are useful" G.E.P. Box

pablosfontoura · Apr 25, 2020 07:45 PM

Thank you @Mark_Bailey and @statman, your suggestions were of great help. I managed to create the graphic and to use some of the analysis methods. I am just not sure if I have chosen the right method, and if I have interpreted them correctly. Please find below my interpretation and the attached file of the data set.

P.s. The question I am trying to answer is if expertise influence the time people look at something (for comparison, I added a second variable to my analysis, which is fixation count). More specifically, I would like to know which kind of expertise (expert or non-expert) influences fixation time and count, and how each of them influences it.

My analysis is this:

"In order to verify at which extent expertise has an effect on total fixation time and fixation count, an exploratory modelling of linear regression was performed. The analysis of variance indicates that there is no relationship between the total fixation time and expertise (F Ratio = 0.5838; p value = 0.4456). This means that the variable expertise did not influence the time people spent focusing on the images. Similar analysis suggests though that there exists a relationship between the variable expertise and fixation count (F Ratio = 1.3631, p value = 0.2442). This indicates that expertise slightly influence the amount of fixations performed".

Thank you once again for your valuable support.

Mark_Bailey · Apr 26, 2020 08:55 AM

Thank you for providing your data in a JMP data table, imported from your Excel workbook.

A distribution analysis shows that you have strongly right-skewed responses. The skew is apparent in the Oneway analysis.

Your hypothesis is that the mean response is higher for experts than for non-experts. The opposite is observed. The difference in the means is -3.878. The appropriate t-ratio is -0.84444 and Prob > t is 0.8.

The responses also exhibit variances that are not the same. I applied the log transform to remove the skew and stabilize the variance. The transformation revealed a few extreme outliers in both responses that are probably exerting very high influence on the statistic compared to the rest of the sample.

The t-ratio is now -1.59778 with a Prob > t of 0.9437.

I found essentially the same analysis of the Fixation Count response.

statman · Apr 26, 2020 11:24 AM

Just to add to Mark's output, here is multivariate analysis on the 2 dependent variables. You did not specify practical significance. Mahalanobis also detects a number of outliers. The outliers could be measurement error, documentation error or some other noise variables not taken into account (one reason to keep track of time order in your data set).

Just playing here...What is it "they" are fixating on? How do you know they are fixating on "it"? How different are the objects being fixated on? Is there any relationship between the object being fixated on and fixation time? Hypothesis: We all have a bias, the bias may effect fixation time. For example, if there is a certain feature (e.g., color or pattern) that there is a bias towards, perhaps the observer will spend more time on that feature. In the quality inspection world, when an inspector sees one defect or area of defects, this influences where and what they look for in future inspections.

"All models are wrong, some are useful" G.E.P. Box

pablosfontoura · Apr 27, 2020 06:19 PM

Thank you @Mark_Bailey and @statman .

The practical significance question is challenging for me to answer right now because I am still learning how to understand the data. I don’k know yet what could be a correct parameter of visual behavior change of experts looking at images of such kind. I will check, that is a good point.

In this research I am using eye tracking data of experts and non-experts looking at 27 different paintings from the same artist. I know they are fixating the paintings because I used a mapping procedure that filters visual behavior mapped onto snapshots of the paintings. The objects are similar in the way they represent portraits of people with an abstract background, but they are still 27 different paintings of different sizes, each of them with its own singularity. Fixation time was free: people had the time they wanted to look at the images, so each participant looked at the images for a different amount of time. The expertise was determined by checking people's profile: if they were professionals of non-art analysis related areas, they were considered non-experts; and if they were professionals of art analysis areas, they were considered experts. I understand this can create a bias as this is a subjective parameter.

My first thought was if expertise could affect the way people look at the images. One way of checking that was comparing where experts and non-experts looked at, verifying fixations on different Areas of Interest. For this analysis I used some basic distribution histograms (I did not put this data here). Then I was thinking of the two other eye tracking data I have: fixation duration and fixation count. I was assuming that the overall time people fixate the paintings and the amount of total fixations could be influenced somehow by expertise (for example: the more they look, the more they tend to get information from the images; the less they look, the easier is for them to inspect the images; the more fixations, the more detailed the visual inspection, etc). I was checking that on an exploratory basis.

Where people look at can be influenced by a lot of things, you are absolutely right @statman, but I was trying to check if expertise played a role on the overall visual behavior anyway.

Your both analysis are great, thank you once again! I just don’t know yet how to extract the outliers from this analysis.

statman · Apr 28, 2020 10:39 AM

To be honest, I really don't understand the "study". Why does it matter how long someone looks at a painting? What are they looking for? How do you know when they found it? In other words, I don't see any efficacy metrics. What do they "have" after they looked at the painting for x amount of time? Are you measuring some sort of brain activity? Do they "rate" the painting after that time?

IMHO your grouping into 2 categories (experts and not) is overly simplified.

Other thoughts:

I think you should expand the study to include more variables (distance from painting, Intensity of light, type of lighting, angle of view, additional artists (perhaps from different genres)).
You don't have any way to estimate or separate measurement error (both within and between observer). For example, you don't have the same person observe the same painting twice.

The first step to dealing with outliers is to identify them. There are many ways to do this (some methods already shown above in the thread). Once identified you have to think about why they are "different". I always suggest predicting outliers in your study before you perform the study. What might cause unusual data points? Do this before the study and then observe if any of your predicted "explanations" happen during the study. In many cases, these are the most informative pieces of information. Once you understand why they happened, then you can play the game of removing them from the data set...The objective is NOT to have a great data set (or great graphs, plots or statistically significant effects). The objective is to learn.

"All models are wrong, some are useful" G.E.P. Box

pablosfontoura · May 1, 2020 07:32 AM

Indeed, there is a lot of things to be thought about in this study. For now the information you provides on the data treatment and questioning was of great importance. I will work more on it.

Thank you for your attention.

Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable

Re: Relation between categorical nominal independent variable and continuous dependent variable