I'm an Associate Professor of Integrated Marketing and Communication
at the University of Mississippi, also known as Om iss.
What I have today is a demonstration of a teaching exercise I use with students.
The title of the presentation is called
Communication style and political campaigns,
promoting a personal connection with an audience.
The question is,
do some presidential candidates use the first person or the second person
more than others during their tweets on Twitter?
This is an important question because we want to form a personal connection
between a candidate and an audience.
One way they can do that is by the use of language in their social media.
The students manually coded tweets during one week
of the presidential primary season in 2016.
They recorded every tweet
that was issued by all 19 or 17 presidential candidates at this time.
What we're going to demonstrate today is how we can test the probability
of a distribution by using the grouping variable by.
The first thing we have after we recorded 1,107 tweets,
the first thing we're going to test
is whether the use of first person varies by party.
This is a typical ChiSquare test.
It's two levels: political party, Democrat versus Republican,
and first person is either present or absent in the tweet.
You can see the test of the relationship there.
The likelihood ratio is significant.
You can look at the graph
which shows us that Democrats typically used first person
a little more often than Republicans, and it was a significant difference.
Now let's get on to the second person.
You can do the same thing. Look at the candidate
and the use or presence or absence of a second person in the tweet.
You'll see also that it's a significant relationship.
The likelihood ratio, the value is 83.7 and it is significant.
Then you look at the graph and you see it well.
Some people obviously used the second person more than others,
but which ones were really different from the others?
You can look at the contingency table,
and in the contingency table, you look across the rows, you'll see
how often each candidate used the second person.
Like Ben Carson used it 4% of the time of his tweets.
Chris Christy used it about 25, 26% of the time in his tweets and so on.
We see Hillary Clinton use the second person about 16 or 17% of the time
in her tweets during that week.
But what we want to be able to do
is test that specific probability or the probability of that distribution.
It is Hillary Clinton's distribution of 17% and 83% really different
from the overall average of all the political candidates.
If you look at the bottom of the contingency table,
you'll see that the distribution really was 80% and 20%.
But you can also find this information by distribution.
Look at Analyze, then Distribution,
and we put the variable in the Y box and hit Okay.
You'll see the frequencies or
the probability the distribution is 80 and 20.
So 80%, 19.9%, I roughly put it at 80% and 20%.
What we want to know is if Hillary Clinton and other candidates
use the second person more or less than this average.
We're not looking at a 50/50 test, we're looking at an 80 versus 20 test.
To do this,
we are going to use the By box or the By field.
To subdivide this distribution by each candidate,
we're going to put the variable candidate in the By box.
We still have our dependent variable in the Y box,
the use of the second person,
but we're going to subdivide it by the variable candidate,
which will produce a unique or individual tests for each one of the candidates.
When you look at this, you'll get a result for each candidate.
For example, Ben Carson first,
and then Chris Christy second, and so on for each one of the candidates.
It'll tell us the same information that we
have in the contingency table with the little graph.
But what we want to know is if this distribution is different
from the 80-20 distribution
that we have for all of the candidates overall.
To do this, we look at the person that we're
interested in, in this case, Hillary Clinton,
and we see that the probability of the distribution is 83 and 17%.
We go up to where it says second person, the name of the variable,
and click on the drop-down menu, the red triangle,
and we find the command test probabilities.
We're going to click on test probabilities and a new dialog box opens up.
This dialog box lets us establish the own benchmark that we want to use.
Rather than testing it against 50/50,
we're going to test it by against 80 and 20.
I type in 0.8 and 0.2 because that's what we're testing.
I leave the setting at a two-tailed test.
I don't know if it's going to be higher
or lower than 80, 20 when I test these distributions.
I'm going to leave it as a two-tailed test.
But I put in my benchmark of 80% and 20%, which I got from the contingency table
or from the overall distribution of the use of second person.
Then we click done.
Here's what we have. This is part of the results.
You'll see that she had 96 tweets.
Of those, 83% did not have the second person, 17% did have second person,
and we're testing it against the distribution of 80/20.
The likelihood ratio or the ChiSquare value is 0.69
and the P value is not significant.
Her use of the second person did not vary significantly
from the overall group average of 80/20.
Let's try somebody else. We do the same thing.
This time we'll do it for Bernie Sanders.
He had 150 tweets that week.
You'll see that he used the second person only about 5% of the time.
We test that against the 80/20 distribution
of the overall group of politicians,
and we see that the ChiSquare is significant it's 29.7 or 29.8%,
and the P value is less than 0.0001.
So yes, his distribution or his use of the second person significantly varied,
but in this case it was significantly less,
only 5% compared to the overall average of 20%.
It's significantly less for him.
Let's try someone else.
Marco Rubio was a presidential candidate in 2016,
and he uses the second person about 24% of the time.
We test that again against the 80/20 percentage,
and we see that his ChiSquare value for this test is 0.88,
and it is not significantly different from the overall average.
A distribution of 20 and 80%.
His use of the second person did not vary
between his tweets versus the overall average of all the candidates.
We'll look at another one. Here's Donald Trump.
He had 105 tweets during that week,
and you see that he used second person about 30% of the time,
which means about 30% of the time he was saying you or you all
or some form of that second person in his tweets.
We want to test that against a distribution of 80 and 20%.
The likelihood ratio is significant.
The ChiSquare value is 6.4, almost 6.5,
and the P value or the significance level is 0.01. You see here the test shows
that or suggests that he used the second person
more often than most of the candidates
who were running during the primary season in January 2016.
This is a way that we can use to test each one of those rows.
At the beginning of the 2016 primary season,
we see that Hillary Clinton and Marco Rubio used second person
to do out as much as everybody else did in the electoral season.
Bernie Sanders used the second person significantly less,
and Donald Trump used the second person significantly more.
This is a way to do a follow-up test on a Chi Square
when you need to test the distribution of individual rows.
You can do this using the Buy button.
You use this to subdivide.
The option to test the probability of a distribution allows us to set
a benchmark or comparison or reference group to something other than 50/50
or generally whatever we might be looking at.
In this case, we set it to 80/20.
This is a way to do follow-up tests
on a significant Chi Square when you can test the probability of a distribution.
I'm Robert McGee at the University of Mississippi,
and if you have any questions, there's my email address, feel free to reach out.
Thank you very much.