When beginning this project, we were tasked with discovering a topic of research which would be able to accommodate both my background in Computer Science and natural language processing with Adem’s own work in the analysis of political speeches and the possible connections they might have with both the individual speaker’s reception as well as possible influences on specific topics of the speeches themselves. It only felt natural that we would work on analyzing political language in some form. After some exploration into different avenues of political language, we ended up settling on the examination of language used by debaters contending to be the President of the United States. This decision was made both due to the availability of the debate transcriptions, as well as our own genuine interest in the sort of findings that might be discovered by studying this particular data-set.
Due to the size and nature of the data-set we had accumulated, deciding which particular avenue we wished to explore in terms of analysis proved to be initially difficult. In order to get a better idea of possible aspects of the transcriptions we could delve into, we began with an initial exploration of our data using the Jigsaw platform for both entity analysis as well as sentiment analysis for our transcriptions of those who both won and lost their respective elections. This proved to be very useful in our brainstorming phase and allowed us to form an idea of how to approach our project’s main research question.
After completing our initial research, we eventually settled on our primary research question: during United States presidential debates, what do winning and losing candidates tend to focus on, and how does their individual vernacular choice affect their outcome in the election? While their individual talking points may be related to events of the time, is there a clear connection between language use and the elections’ outcome? We decided upon this question after conducting sentence structure analysis in the Jigsaw platform and by noticing the obvious topics (in this case entities) regularly covered by the candidates. From here we decided to zoom in on individual elections and see the way in which specific events surrounding the elections themselves would affect the topics discussed and how this related to the outcome of the election.
After we had decided on a specific topic of research, we began our exploration of vocabulary usage and sentence analysis using the Voyant platform. Using this platform, we were able to see immediately tangible results in relation to the text transcriptions. One feature in Voyant that we found quite helpful was its word-cloud creation tool. While word-clouds themselves have major issues in the realm of Digital Humanities in terms of validity in the field of research, they are used by almost every news outlet which chooses to make visualizations of political speech. In our preliminary research into scholarly sources of data visualizations of political debates, by far the most commonly used form were word-cloud visualizations. It’s easy to understand why this is the case, due to the ease of creation of these visuals as well as their ability to give a brief snapshot of a speech. But as far as lasting conclusions and individual generation of knowledge, these types of visuals don’t offer much. Often they are merely scratching the surface in terms of studying a set of text. Because of this, we decided on using individual word-clouds of each candidate’s dialogue for every debate. We believed that, while this wasn’t the end-all in terms of visual exploration of the debate transcripts, this served as a quality static jumping-off point for viewers. They could view these visuals, get a quick understanding of some of the topics that were highly used by particular candidates and then could delve deeper into the research themselves, following a martini-glass structure of project presentation as described by Edward Segel and Jeffrey Heer.
In order to create a display a visualization which not only is presented in a pleasing and approachable way, but also generates knowledge much in the way that Tanya Clements describes in her work, we knew we needed to make something that the user would be able to interact with. To do this, we first had the task of taking our transcriptions and combining them with the Gephi platform in order to create a network visualization of the individual nomenclature used by the winners and losers of the presidential debates our transcripts were associated with. We chose a network design due to my own previous experience in creating language-based network designs earlier in the course. But, in creating the network visualizations we had the issue of deciding whether to make multiple networks, one for each election, or to make one large all-encompassing visualization of overall vocabulary usage split between winners and losers. We decided that due to the importance of the time in which these debates took place, scrapping the temporal component of our data entirely would entail losing a large amount of information. But we had the problem of how to display this time information effectively. Gephi has a timeline tool which allows the user to mark nodes and edges with time intervals to be presented in a dynamic display. The only issue with this, other than the finicky nature of the Gephi platform itself, is that the size of our data-set is so large that it would be difficult for Gephi to render it effectively, and even then it would be difficult to interpret as a viewer. So, instead of this, we decided to make one overall network of all debates, and then to make a visualization for each individual election, ordering them all chronologically for the reader to discover in or out of order. By using the TimelineJS interface, we were able to not only post links to all of our interactive visualizations created using the Gefx-JS web-viewer, but were also able to add additional context for each election in the form of events surrounding each time period. This gave us the ability to frame each visual in a way which would allow the user to draw more educated and informed conclusions from our data.
After constructing our website containing all of our visuals, with links to our Voyant-created ones as well as our interactive network ones, we were able to get a better idea of just how our research question might be answered. While looking over each set of visuals, it became abundantly clear while some overall terms might be more associated with election winners than others, such as talk of the future and community, the vocabulary that led to success was largely a factor of the time in which the debate took place. Whether the world was in the middle of a bout of political or economic turmoil, or if the nation was in the middle of a period of prosperity, the winning set of terminology varied. This makes sense upon further reflection. What the American people want/need to hear at a given time may be very different than what they require at another. This project has given me an interesting look into the United States political system and how we, as individuals, view those we put in positions of power. Our collective consciousness has a way of jumping from intense focus on certain topics such as liberty and security after we’ve been badly beaten, and more on social issues when we are given the time to look inwards at our own national needs. But no matter what, what we say and what we want to hear can directly impact how we view the world. That’s why we must acknowledge our needs consciously, and choose our words carefully.
Leave a Reply