Authors explain data sets related to the Mike Brown shooting part 2

Resource added
Start Playback
0:00
  • type
    Audio
  • created on
  • file format
    mp3
  • file size
    6 MB

Full Description

James: So that was the network analysis we did of Michael Brown and its interconnection with the Eric Garner incident. That was an interesting way to our project. So, here’s a second phase: these are the networks we created around the 2016 election debates between Donald Trump and Hillary Clinton. What were your thoughts on this?

Jeff: For me the second data set was the first really big “aha” moment. It also goes to show you why you do studies in the first place. Because you might find something that surprises you. Maybe even something that was contrary to what you expected. I remember I went into the data collection of this second set thinking about the whole “fake news” phenomenon. I was expecting to see how much manipulation there was, how much Russian interference through bots and the like was behind Trump’s popularity on Twitter. However, we had difficulty operationalizing fake accounts at first. One way was to look for Russian Cyrillic script in the tweets, but that’s not necessarily foolproof and you can still miss fake accounts. But all of that turned out not to be the point after all. What we found […] was how all of these unrelated hashtags were being hijacked to boost their popularity. For instance, one could boost a post about Trump by including other unrelated but popular or trending hashtags, such as the iPhone 7 hashtag when it was released at this time.

This was how pro-Trump tweets were getting more noise on the network despite being outnumbered. There was really a concerted effort among a lot of the pro-Trump posters this way. They did this with pro-Hillary hashtags too. They hijacked pro-Hillary hashtags by pairing them with pro-Trump hashtags and messages. Pro-Hillary posters did not do this, and their networks were smaller and more dispersed as a result.

James: It really was an “aha” moment, because […] the data was really unwieldy at first. It was a very cumbersome dataset to work with. It was large, first of all, and we had to do a lot of cleanup of messy data, messy data formatting, etc. Because of the volume of data we had from Twitter, it was a lot of processing time. There was a lot of information to process. I remember working with Ezra and several of our graduate students over several months preparing this data for analysis. Initially it was really difficult to wrap our arms around it to get it ready for analysis. Ultimately we were able to create these networks. I remember this network was a jaw-dropping moment. The network is really quite remarkable for me in terms of the value of data visualization because you can really see in a way that you can’t just see by reading a hundred or a couple of hundred tweets. In this visualization you can see the relationships between all of the actors on Twitter and how the right-wing hashtags and actors are so connected, creating a real and very dense network while on the left, it’s not a network that you can actually speak of, but rather a sea of isolating users and voices that were unconnected or very sparsely connected, not really communicating with each other, or interacting through likes or retweets. That contrast in the visual form of networks was so striking and I think that with data visualization you can see something of the social network structure that is difficult to perceive only by reading tweet comments.

At the same time […] the network adds another layer of meaning when you can look at the actual content of the tweets. I remember when we saw these trends on the iPhone 7 and the inclusion of Cyrillic script, I think we really saw the value of data visualization combined with the ability to look at the underlying text content. It was a wonderful multimodal analysis that we created to enable this kind of analysis to unfold. So from a methodological point of view, I thought this was really striking and a validation of the visualization methods we were using.

Jeff: Yes, and it showed that visualization is really quite different than your typical scientific method and other data science because the data is so fluid. You can see things right away. Thinking back to what I was expecting to see in this data set it would have been a complete waste of time, for instance, to formulate a hypothesis, or any hypotheses, going into this because it would have been dismissed as soon as we looked at the data visualization. The data visualizations allowed us to take several steps analytically that typically you would not have been able to do so soon.

James: So this was 2016 and these were some of the revised networks I created. This one here is election day.

Jeff: The knockout experiments that we did were another enlightening enhancement to our methodology. The knockout experiments are when we took out a particular node or twitter handle within a network to see how its absence affects the network. What would this network look like without a particular actor? It’s really a clever way of reverse engineering a social network on Twitter, and the best way that we could get a visual assessment of the influence that individual actors had on the network. And again, if I was asked to give a hypothesis before doing this part of the study, I would have predicted that Donald Trump would be the lone significant actor in this huge network. If you take him out of the network, the network will fall apart or become highly fragmented like Hillary’s was. And that would be absolutely wrong. When we knocked Trump’s Twitter handle out of the network there was still an authentic and significant core of followers. In fact, the most significant followers in the pro-Trump network were the right-leaning media personalities. They were his echo chamber and they accounted for the cohesiveness of the pro-Trump network.

James: So it wasn’t so much Donald Trump’s Twitter handle, but that whole network of people on the right-wing supporting and feeding into each other online to create this network. Trump’s handle has created around it this thick web users, and tweets, and retweets and likes that you can really see and visualize here in this network. This one in particular I think is showing […] how data visualization can really be an analytical mode, not just a pretty picture display. If you look at Donald Trump and Hillary Clinton’s Twitter nodes in the center, it looks like they’re balancing each other out as two adversarial nodes, but that’s not really the picture. What the network visualization shows us a bigger picture that allows us to see all of the right-wing influencers on Twitter in his orbit. They’re almost like rhizomes shooting out from his central presence and creating their own centers of gravity with the people around them. Sadly, we see in the center here the traditional news media sort of being insignificant. Overall I think this network is another example where data and specifically data visualization really can illuminate something that we wouldn’t be able to understand if we just read these tweets annually. You wouldn’t be able to see these trends and connect the dots at this scale to see the bigger picture. When you see the interrelations map […] connecting these different pockets and communities of users, […] you can actually perform a kind of analysis that you couldn’t do before. The other addition of note here is that we have the social network analysis metrics like betweenness centrality that we use extensively. The centrality measurements allowed us to home in on who these “influencers” were and allowed us to validate a lot of the visualization results to support our initial findings.

Jeff: That’s right, James. The centrality measurement was another key to understanding the data. If you just counted the number of tweets, for instance, Hillary would have come out on top. Based on that raw descriptive number, without any contextualization of the data, you might reach some inaccurate conclusions, that the pro-Hillary contingency on Twitter was more dominant than Trump’s, which was clearly not the case. Another way to think of it is how the popular vote works in US elections. Whoever gets the most votes across the entire country doesn’t necessarily win the election. What matters is how votes amass within certain states and some states are worth more than others. By being able to look at centrality and relationships in the Twitter datasets, we can see that the pro-Hillary network was simply too fragmented. There were these little islands, many pro-Hillary nodes, but they were small and isolated. And thus, marginalized compared to the massive continence of pro-Trump nodes. The way the nodes amass, not necessarily the number of tweets is what mattered most. The pro-Trump posters were more influential actors, even with the pro-Hillary hashtags. They hijacked them or cannibalized them depending on what analogy you want to use. Also, the political right tends to be more congealed in terms of their agenda and messaging on and off Twitter.

Add Comment