Don’t try to start a Twitter wire service

Part of my research includes the scraping and visualization of bulk Tweets. I end up seeing a lot of sentences this way, far more than I will ever actually read. I use Mallet to text mine my Twitter collections.

I have noticed a number of groups attempting to create what appear to be wire services either with regional handles or candidate specific Twitter streams, such as the candidate_news_network and POLS. These are messing up my analysis as they pump huge volumes of uniform text with little relevance.

Here is my line on these enterprises: they are a false start at best and generally junk.

This is what Wikipedia calls a junk pile.

Does anyone really care what a brand new wire service writes in a microblog format? I can’t believe that any real fan of a candidate needs a curated Tweet stream when they can have the real thing.

This is Twitters problem in a nutshell: the news services are a poor substitute for a poor substitute for real news and analysis. Are these services intended for “novice” users? Newsflash: there aren’t many of those, Twitter is slowly burning out, not building new. My best guess is that these news services are intended to accumulate followers and then sell out. Sort of like Twitter should have long before that IPO thing.

What does this mean for me? Finding ways to clean this stuff out of my dataset, blerg.

Person of the Hour?

The basic cluster dendrogram over the last three weeks reveals three distinct sub-topics.

There are three distinct clusters.
There are three distinct clusters.

 

The first category is a widely re-tweeted message:

#WakeUpAmerica✅DEMAND✅VOTER✅IDENTITY✅ INTEGRITY#TCOT#YCOT#PJNET#COSProject#Election2016

There are slight variations that include some other references such as “@truethevote,” but the general purpose of this category seems to be a claim related to voter fraud. The poster is a part of an active Twitter network that circulates Tea Party related messages. This network is very small and highly active, if I re-modeled to exclude retweets they would likely fall away entirely. There is little doubt that this network is quite artificial. This particular network seems to like Trump and Carson, and strongly dislike Clinton, Rubio, and Bush. They don’t seem to have a lot of activity related to Fiorina or Sanders.

The second topics seem to be a cluster on other candidates with one side twisting toward Walker and Sanders. Fiorina has an entire sub-section, with no reference outside of that sub-section.

The last cluster of the dendrogram includes references to Bush, Trump and Carson.

Clearly there is a strong anti-Clintion sentiment and a strong Tea Party resonance, but the real flow of Tweets seems to be related to the idea of a pure populist rage that occasionally includes references to particular candidates. This is not candidate advocacy.

I may need to find a new approach to cleaning the data…