Carson contra Trump

From what I have monitored on Twitter, Ben Carson may a well not be running for President. In the primary stream of election Tweets his name is nearly non-existent and the Tweets that feature him tend to be poorly formatted. To be very specific, Carson is mentioned in less than .05% of all Twitter traffic related to the main election hashtag.

It would be something of a surprise if he was tied with Trump as, well, he has basically no social media following.

But what does it mean for Carson to pull equal to Trump? The following list should be taken as a probability assessment without evidence.

A. It could confirm my suspicions so far, that Trump’s social media support and that most election activity is autopoetic, or these are hacks and robots arguing with and persuading each other in a self-congratulatory pablum. This can be true in combination with the other explanations.

B. Perhaps people really like Ben Carson, this is not an entirely unlikely scenario, but it seems to over look a great deal of other polling that suggests that Trump is strong. It is also difficult to take Carson as the candidate in this role as he, like Trump, is so deeply attached to the media. In this case, Carson is a favorite of Fox News. This could also explain why he has such a small social network footprint: support for Carson or Trump may break based on media preference. If Trump leads Republicans with smart phones and Carson those with cable, who will win the newspapers, and for that matter the bartenders?

C. The polls could be faulty, especially individual polls this early in the contest.

I am sure we will get more clarity at some point, but as it stands now, this is just more detritus in a murky stream.

Modeling #Squadgoals: Finding The Squads

#Squadgoals is an index of fandoms often ignored by the popular press, this computational approach mines the use of this hashtag, with all the possibilities and pitfalls inherent in that method.

 

If you are not familiar with youth culture, this post won’t be particularly meaningful for you, even some folks I know who are tuned in, don’t know about squad goals. Such as…

#squadgoals, see also: cheetah girls

Things you need to know, fan cultures are organized around references to a central identity or fandom with other lesser elements organized around that. In this case, one could be a part of the cheetah girls fandom, and thus identify with the imaginary community of the squad. When the entire operation is well-oiled, you are in fact, on point.

But what are the most important squads today? Clearly the Cheetahs were important a decade ago but their continued squad-ness depends on nostalgia – not a new fandom. My students (Oregon State, Survey of Social Media) really wanted to know more about the squads, especially which ones were dominant on Twitter.

Method

Our method? We scraped Twitter for all uses of the hashtag #squadgoals, this is frankly more interesting than the use of #relationshipgoals or even just #goals. We then used Mallet to do topic analysis of the resulting scraped Tweets.

There was one big problem: Twitter isn’t exactly reliable. There were over five-thousand squadgoals tweets over the past thirty-six hours. When we asked for the last 200,000 it returned that the API could only return just under 36,000 tweets and that those only went back a week. Issues with Twitter are known, if one wants real longitudinal Twitter information they need to observe over time. There is no last second or retroactive research solution.

Also, here is a bigger problem. Any sort of token analysis like this is vulnerable to noise. A robust stopwords file can filter the results for better analysis. The choice of stopwords is fraught with danger, as the stopwords file improves, the topics assigned and the model will appear to fit the data, or at least the researchers sensibility of the data, more and more closely. There is a real risk that a researcher using a frequency table and a stopwords file could sculpt a computer reading of a document sent that fits their needs. Of course, all research can fall prey to the problem of heuristic availability – some researchers go hunting for a significant p value, there are many possible sins to be committed here. For the purposes of this project, I used my main stopwords file, stopwords2.txt, which I can provide if you would like it. As an aside: I do think that the development of stopwords files is an important topic for critical cultural studies, especially as practitioners deploy computer listening, reading, and vision.

Unfortunately, I am not going to be renewing my Tableau license until October, and Mircosoft excel is about as useful as a hammer for polishing a glass menagerie in situations like this, so R driven graphics will be coming your way.

Results

Results expressed as a dendrogram:

Squads dendrogram

 

As the chart cascades down, smaller topics are broken from the larger topics, or more precisely, the topics that are lowest in the tree are those first collapsed into the larger topic. The dominant squads of the third week of August, 2015.

Squad Ranks

Results as a list with pictures.

1. One Direction (trying to figure out the relationship to Timberlake)

Screenshot 2015-08-25 15.25.52

2. Walking Dead

3. Sports/Yankees

4. 5 Seconds of Summer

5. Hottopic

6. Hunger Games.

7. Fifth Harmony. (Taylor Swift just officially added Fifth to her squad, btw.)

8. Greys Anatomy

Grey's

9. Summerslam

Summerslam

10 (tie) Outfithaven/Clothes Hack, femninistiajones

11. NFL Kicker Pat McAfee

12. Little Mix from X-Factor season 8(England)

HOLD THE TRUCK

What if we look at the squad goals that were the most popular on an individual basis? Then our key squad is:
Scrubs

 

#teamturk, Scrubs 4 Ever. Etc.

This method of sorting reveals another problem, this particular Braff tweet was listed 17 times in the data, just as Braff. That suggests that there are other problems, and the lack of status text drops this powerful image behind the text rich posts related to One Direction.

By this ranking method, our squads are:

Srubs, Starwars(Vader), Napoleon Dynamite, Guardians of the Galaxy, Blue Mountain State (a television program on Spike TV, the network for men), Taylor Swift, some sort of poorly edited image, Eid, and then One Direction.

What did we learn?

Mapping the squads is difficult. Automated means allowed us to see past Braff’s raw numerical superiority. In a world where there a third of the dataset could have possibly been tied to Braff’s tweet, it seems possible, if not likely that large swaths of the data could have been lost in the process of building this model. I believe that the problem here is not with our approach or tools, but with Twitter itself. It was not our software, but the API that was exhausted. Perhaps this is the truth of the squad, it exists on the level of the aspiration and the imagination, not on the level of the database.

Raw Twitter Activity is NOT a Poll

I have been actively tracking election activity on Twitter for two weeks now. The impact of strategic communication on Twitter is very clear. There are sockpuppets and shills everywhere. They eclipse organic use of the network. Here is a chart:

Dendrogram with no cleaning

Well, I guess this is what you see when hundreds of users retweet the same content basically simultaneously and the entire network seems designed around communicating one perspective. That chart is garbage. Inferences made based on that chart are highly suspect.

It’s cleaning time.

Some of the first users I removed were clearly sockpuppets. One user sends dozens of posts per hour including pictures from Victoria’s Secret, Bob Marley, popular image macros (such as keep calm memes), and retweets of popular figures like Joel Osteen, along with some libertarian content. This seems like it could be the work of a robot as it maps what would be popular content.

Sorting out what it means for such a large percentage of links to include rich media is also difficult. Could half of people want to share an image with every conversational tweet, or is this just another indicator that this is an artifact of autotelic speech?

Here is a cleaner chart:

Cleaner

Suddenly Scott Walker is everywhere because of an extreme position on immigration, Trump is in the pack, and the polls don’t seem to matter. The top two clusters are still a bot retweeted tweet and a counter effort to that tweet. This is not an organic conversation network.

In short, Twitter is not a poll. Twitter is a microbroadcasting network utilized by a variety of groups to create the illusion that they enjoy broad popular support. Although this will not curtail research on Twitter, or the coverage of Twitter by the legacy media, it is important to remember that it isn’t a transparent window in the social now, but a contested rhetorical field.

The alternative to the consideration of Twitter as a rhetorical struggle is to suppose that a spammer retweeting pictures from Victoria’s Secret and cool pictures of guitars  is an expression of genuine politics.

Text Mining Trump

To this point in the election, the biggest surprise has been the insurgent candidacy of reality television star Donald Trump. The strength of his candidate has clearly hinged on two things: his ability to project his views independent of the legacy media (his reach is astounding via Twitter) and his positioning as an insurgent right wing candidate against the orthodoxy of party media and discipline. For all the attachment to party structures and the historical right, there is a spirit of rebellion to various Tea Party groups. The main hashtags organizing Trump’s online presence have been #Trump2016 and #makeamericagreatagain. As a part of this project, I have been tracking the basic Trump hashtag for sometime.

One aspect of Trump analysis could take the form of network mapping, this approach would either create a pseudo-network of users to terms, or my favorite approach to map the @ network of a large slice as a conversation network. Another approach is to use computer topic modeling to read all of the Tweets. This post will do that, using Mallet deployed through R. This is a rough topic model of all #Trump2016 material so far.

I will write a more complete methodological entry about my approach here later.  Right now, I have some processing power and software issues, my University will resolve these soon. Also, building libraries of stopwords is tricky, as little phatic and coordinating moments are of great interest for critical/cultural research, which generally hasn’t been the core audience for topic modeling systems.

 

The Data

At first these topic lines may seem silly, but they do represent connections between terms across the tweets recognized by the topic modeler. This dendrogram shows how topics are merged together to represent the entire Trump dataset.

dendrotrump1

As you can see Clinton and Biden make in the topic labels, as does a one Republican, in the appropriately named label: don’t like bush.

trump1

This is the analysis of the flow of those patterns so far. notice the large gray blob in the center – this is a field of activity basically run by danscavino, a former Trump advisor. During the time around the debate he basically was the Twitter conversation. The olive green topic appears later and is now important, the message of these tweets: the American people are speaking. Much of this blocking comes from retweet storms where users retweet or redeploy an image macro or link. The impact of Twitter shortened URLs in particular has fallen off, this is the popsicle orange color in the lower half of the chart.

An analysis of the data as a network might give some additional information about diffusion and network structure, given the inclusion of television programs and personalities in the list of topic labels it seems possible, if not likely that this approach to topic modeling has also identified key figures in the discourse network.

If other research about the dimensions of the cultural position of conservative populism is any guide, the deployment of the “people speaking” in the Twitter stream suggests that the rhetorical frames of the Tea Party have fused with the Trump campaign. The hostility of these frames for the traditional steering media of the conservative public sphere combined with the demand for real data/polls suggests that the underlying argument that Trump trumps pundits may have real resonance. Over in the “who is winning” tab a similar analysis suggests that Trump on immigration is a central category, and that Sanders and Clinton are more likely to appear as mineable topics than other candidates.

Does this mean that Trump could win the nomination? Unknown. Could it be that we should take a step back and honestly measure if we think that Twitter is a proxy for real public affect? Yeah, that would be really important. Furthermore, this approach does not attempt sentiment analysis, so it is possible that many of these tweets may actually be negative for Trump. This mismatch between intensity and valence was a major issue for the Romney campaign in 2012, remember, they were winning Twitter after all. Or at least, that is what they said.

Trump Up

To this point in the election, there has been no interesting news on the Democratic side apart from the rumors that Biden may enter the election. From a social media perspective, I have yet to see much by way of O’Malley activity. Unlike his Republican counterparts, he may actually gain from the primary/caucus process as people might recognize his name more than they did before. Once we start talking about a Biden-Clinton primary race, then there might be some interesting activity on Twitter. Bernie Sanders has some real Twitter energy, although this has yet to translate into the polls in any meaningful way.

On the Republican side, there is more action. Six weeks ago, the race was unclear. Then Donald Trump happened. This is a welcome development, not because of the normative policy positions of Trump, but because of his propensity to shatter the silly horserace narrative. No, the people of Iowa never had a love-affair with a pro-choice New Jersey governor. Stories touting his “narrow path” to the nomination were mendacious. Pundits arguing that Trump would simply fade away should be viewed with suspicion. There is no historical example of a candidate that simply disappeared at this point in the race. Trump is an energy-candidate that is a short-circuit in the Republican approach to harnessing anger. Instead of allowing rage toward Liberals, minorities, women or other groups to remain contained yet channeled behind a marginally electable candidate, Trump directly charged by this energy, rather than carefully surfing the wave. (My personal pick a month ago had been a brokered convention with Romney winning).

Rage could be a non-renewable resource. Allowing the extreme right wing to attack the center right and divisive public debates on issues ranging from sexual violence to torture to happen in an unrestrained, fully bombastic mode might change affective landscape. Trump is an affective natural gas flare at an oil well. What he burns now to accelerate the pace of his campaign could change the future affective well of the race. Notice the range of attempts to burn even larger affective reserves: Cruz with a bacon machine gun, Hucakbee making a comment about ovens, Christie vocalizing his desire to punch the teacher’s union in the face. I am sure there are more. The point: the affective tone is a cacophony, loud and dissonant. Adding more shocking, outrageous bits could have little impact.

Here is some data:

0000000_1438321440000_1438753440000_12

As you can see, Immigration issues (#tntvote is pro-comprehensive immigration reform), is running relatively even with other major hashtags on the mainline, #election2016. Total reach for this hashtag is roughly under 35 million.

#Trump2016 on the other hand has a reach of nearly 45 million alone. This is important. Major hashtags for the other candidates have yet to mature.

As the flow of information develops through the debates, it seems likely that Trump will only get stronger.