Blog Post 6: Viewing The Political World Through The Spoken Word

When beginning this project, we were tasked with discovering a topic of research which would be able to accommodate both my background in Computer Science and natural language processing with Adem’s own work in the analysis of political speeches and the possible connections they might have with both the individual speaker’s reception as well as possible influences on specific topics of the speeches themselves. It only felt natural that we would work on analyzing political language in some form. After some exploration into different avenues of political language, we ended up settling on the examination of language used by debaters contending to be the President of the United States. This decision was made both due to the availability of the debate transcriptions, as well as our own genuine interest in the sort of findings that might be discovered by studying this particular data-set.

Due to the size and nature of the data-set we had accumulated, deciding which particular avenue we wished to explore in terms of analysis proved to be initially difficult. In order to get a better idea of possible aspects of the transcriptions we could delve into, we began with an initial exploration of our data using the Jigsaw platform for both entity analysis as well as sentiment analysis for our transcriptions of those who both won and lost their respective elections. This proved to be very useful in our brainstorming phase and allowed us to form an idea of how to approach our project’s main research question.

After completing our initial research, we eventually settled on our primary research question: during United States presidential debates, what do winning and losing candidates tend to focus on, and how does their individual vernacular choice affect their outcome in the election? While their individual talking points may be related to events of the time, is there a clear connection between language use and the elections’ outcome? We decided upon this question after conducting sentence structure analysis in the Jigsaw platform and by noticing the obvious topics (in this case entities) regularly covered by the candidates. From here we decided to zoom in on individual elections and see the way in which specific events surrounding the elections themselves would affect the topics discussed and how this related to the outcome of the election.

After we had decided on a specific topic of research, we began our exploration of vocabulary usage and sentence analysis using the Voyant platform. Using this platform, we were able to see immediately tangible results in relation to the text transcriptions. One feature in Voyant that we found quite helpful was its word-cloud creation tool. While word-clouds themselves have major issues in the realm of Digital Humanities in terms of validity in the field of research, they are used by almost every news outlet which chooses to make visualizations of political speech. In our preliminary research into scholarly sources of data visualizations of political debates, by far the most commonly used form were word-cloud visualizations. It’s easy to understand why this is the case, due to the ease of creation of these visuals as well as their ability to give a brief snapshot of a speech. But as far as lasting conclusions and individual generation of knowledge, these types of visuals don’t offer much. Often they are merely scratching the surface in terms of studying a set of text. Because of this, we decided on using individual word-clouds of each candidate’s dialogue for every debate. We believed that, while this wasn’t the end-all in terms of visual exploration of the debate transcripts, this served as a quality static jumping-off point for viewers. They could view these visuals, get a quick understanding of some of the topics that were highly used by particular candidates and then could delve deeper into the research themselves, following a martini-glass structure of project presentation as described by Edward Segel and Jeffrey Heer.

In order to create a display a visualization which not only is presented in a pleasing and approachable way, but also generates knowledge much in the way that Tanya Clements describes in her work, we knew we needed to make something that the user would be able to interact with. To do this, we first had the task of taking our transcriptions and combining them with the Gephi platform in order to create a network visualization of the individual nomenclature used by the winners and losers of the presidential debates our transcripts were associated with. We chose a network design due to my own previous experience in creating language-based network designs earlier in the course. But, in creating the network visualizations we had the issue of deciding whether to make multiple networks, one for each election, or to make one large all-encompassing visualization of overall vocabulary usage split between winners and losers. We decided that due to the importance of the time in which these debates took place, scrapping the temporal component of our data entirely would entail losing a large amount of information. But we had the problem of how to display this time information effectively. Gephi has a timeline tool which allows the user to mark nodes and edges with time intervals to be presented in a dynamic  display. The only issue with this, other than the finicky nature of the Gephi platform itself, is that the size of our data-set is so large that it would be difficult for Gephi to render it effectively, and even then it would be difficult to interpret as a viewer. So, instead of this, we decided to make one overall network of all debates, and then to make a visualization for each individual election, ordering them all chronologically for the reader to discover in or out of order. By using the TimelineJS interface, we were able to not only post links to all of our interactive visualizations created using the Gefx-JS web-viewer, but were also able to add additional context for each election in the form of events surrounding each time period. This gave us the ability to frame each visual in a way which would allow the user to draw more educated and informed conclusions from our data.

After constructing our website containing all of our visuals, with links to our Voyant-created ones as well as our interactive network ones, we were able to get a better idea of just how our research question might be answered. While looking over each set of visuals, it became abundantly clear while some overall terms might be more associated with election winners than others, such as talk of the future and community, the vocabulary that led to success was largely a factor of the time in which the debate took place. Whether the world was in the middle of a bout of political or economic turmoil, or if the nation was in the middle of a period of prosperity, the winning set of terminology varied. This makes sense upon further reflection. What the American people want/need to hear at a given time may be very different than what they require at another. This project has given me an interesting look into the United States political system and how we, as individuals, view those we put in positions of power. Our collective consciousness has a way of jumping from intense focus on certain topics such as liberty and security after we’ve been badly beaten, and more on social issues when we are given the time to look inwards at our own national needs. But no matter what, what we say and what we want to hear can directly impact how we view the world. That’s why we must acknowledge our needs consciously, and choose our words carefully.

Relational Graph Analysis on Characters in World of Warcraft (Collaborate w/ Zhengri Fan)

Data Preparation (Zhengri)

Network Visualisations start with questoins.(Lima) So we start this project from our previous visualisation on RPG game’s topic. Our question is “Is our previous observation true in actual cases?” Therefore, we take the official novel (official story text) of World of Warcraft to help us answer this question. The book is named as War of the Ancients Trilogy. We create two datasets from the text: 1. the relationship between characters 2. the character’s identity. The first step is to find characters from the text, so we use the tool jigsaw to extract person name from the text. Then we use algorithms to build a relationship table. For the character identity table, we build a data scheme of Gender, Name, Affiliation and Race. We do this because our question is to explore our previous prediction’s influence in an actual case. Our previous prediction is mainly on gender of characters and the affiliation of characters in RPG games. I won’t talk about the data preparation in detail because it is mainly my colleagues work. If you would like to explore more about this pls visit his blog post. The data preparation should count as very important in our project because it is the most important basis.

 Data Analysis w/ Gephi (Jiayu)

I got the list of character’s relationship from Albert(my teammate) then I start to use gephi to visualize the graph data. Though Gephi has been updated since my first use last year, It can not support space between text. Therefore, before inputing the name data into Gephi to create relational graph, I eliminate space first usingCapture the code =SUBSTITUDE(row, ” “, “_”) in excel (I mention this because it might be really heapful for the future gephi user.)  After I input my data into Gephi, the output is like the graph on the left. (Well, it is not quite exactly the same, but the “DEGREE OF MEANINGLESS” matches.) It looks pretty but it reveals nothing. Though we can make the strong relationships look more significant and the influential nodes’ color darker, it tells nothing. Next step following our relational creation is to identify the identity on each node. Therefore, we combine the information in our identity list and our relational graph. each node in our dataset involved with its gender, affiliation and race. We choose those attributes to build our node identity scheme because we would like to continue our previous project on RPG games, which reveals the gender facts and affiliation fact. Gephi’s data managment feature works very well because it is able to do a natural join on two data set (It combine relations with the node’s attributes even if they are two separate dataset, the key(id) we use to do that is the name). So our futher analysis on this WOW character data is on 1. the gender 2. the affiliation 3. the race

Gender Analysis on WOW Data:

Capture4

On the right is a character relation graph with partition coloring based on gender. The green color stands for male and red color stands for female. The blue one (yes, they do exist) is those in unknown gender (animals or just unknown type). The pattern is, well, very straight forward yet predictable. Male is dominating RPG game and story. Though there are still some red points with strong connections, It won’t change the fact that we don’t really need a female figure. Even if it is the most famous and legacy RPG game World of Warcraft. This pattern can be more Capture5shocking if we do a group to the data. (the graph on the right) The big green dot is of course the male character group and the poor unknown group are the small dot on the 4 clock direction (if you can not notice it at the first glance haha. you may read the post on the high definition webpage to find it.). The exploration on WOW proves our previous observation very well.

 

 

Affiliation Analysis on Gephi:

Another observation on our previous data analysis on RPG games are that the affiliation of bad guys, the villains, act as more important roles in RPG games. However, the graph tends to tell a different story for World of Warcraft. On the left is a visualization of the affiliation grouped graph. Red Color is the good guys and blackCapture2 tends to be the villain and green color is those characters that on the neutral side. Connections between those red points totally shut down our previous prediction on the affiliation influence towards RPG game. When we explore more on the most significant points (the big green(neutral side)), we find something interesting that may explain the mis-prediction. The neutral side characters, even if they tends to act neutral, didn’t have much connections with the villains. Then, the connections between villains and heroes are always very strong (the widest red line). And there is only few connections between different villains. So we can find a story behind this affiliation relationship. That because neutral characters have no connections with villains, we can say that they are more like “background NPC” instead of core characters. They don’t actually involve in the conflict and they are mentioned because the protagonist meet them. Then because the connections between villains are really weak, we can conclude they are truly THE VILLAIN. They are very strong and they are strong enough to conflict with the heroes without much cooperation. Then the strong connection between heroes and villain might lead to the fact on massive conflict or main story line. Then we can conclude, villains in our story are not the first glance in the graph, not important. They are just depicted as lonely villains. The strong connections between villains and heroes proves their importances. However, I would like to say that the villain’s figure is so cliche in World of Warcraft. It is just a very traditional Byronic Hero.

Race Analysis on Gephi:

The third analysis on Gephi we built is the partition based on race. I would like to prove my previous analysis on affiliation with my race analysis. That the relationship strength reveals the node’s level/status in the story and the size of the node reveCapture3als the node’s level of loneliness. This graph shows that in WOW novel, the race with most characters is night elf the big purple node on the graph but it is not the most important one. In fact, the red_wyrm (red dragon) puts influence the most in the story. There is only one red dragon in WOW’s world, so the size of the node is really small. However, the edges of it tend to be giant across the world. In actual story, red dragon is truly the most influential character btw. So It proves my previous assumption on how to read the graph.

Graph Theory Analysis on Gephi:

Not only infographic analysis, gephi can also do some very interesting data anCapture7alysis from the graph created. When we talk about graph theory, we are trying to use graph theory to find solution for some statistical consideration on the data. We would like to know about the distribution, shape, and the density. I.e. we would like to know how the characters are connected. Are they connected really tight or not. On the left is the theoretic graph analysis from Gephi. Average Degree is the average influence per character and graph density is how they are connected. From those number the most interesting number is the average path length. From the 6 degree theory, we can predict that in real world social network, the avg. path length ~ 6. However, in WOW, the average path length is 1.32. That means you, as a nobody in that world may connect with our villain in a degree of 2 step. It is really a tight relationship in RPG. In other world we can say, the social status and social barrier is very thin in RPG game’s world.

 

Compare to Our previous tool: Google Fusion Table and Palladio (Zhengri):

 

In this section, I would like to say conclusion first: Gephi is much more sophisticated than Google Fusion Table and Palladio because those 2’s feature are only subset of Gephi’s. (Palladio updates from 1.01 to 1.13 and It’s performance is very good now). Google Fusion table and Palladio is easy to use compared to gephi but the thing is things can be done by Google fusion table and palladio can also be done in Gephi but Google fusion table and palladio can not do what Gephi can do. The first feature that they are not able to do is the data managment feature provided by Gephi. Google fusion table and palladio can hardly do database operation like theta join or natural join to the dataset. so that the data scheme can not be added to the data relation. Then, the graphic model of Google fusion table and palladio is insufficient. The only visualization model they can use is force atlas.  At last, it is hard for them to do deep data analysis based on graph theory. The comparison between Gephi and them just like the comparison between Photoshop and Windows paint tool. Though they look the same, they live in totally different categories. (Professional productivity tool vs. Temporory tool for fun). They can do simple visualisations but they are not able to do some deep analysis. Capture8Capture10I should admit that at least they looks very nice.

Curriculum Visualization in Gephi

For this assignment, I once again chose to visualize Bucknell’s database of course information for the Fall of 2015, so that I can most effectively compare Gephi to Google Fusion Tables. I modified my original dataset to work in Gephi by creating a CSV file containing nodes and a corresponding edges file to draw largeViewGephidirected edges between the nodes. Courses and (College Core Curriculum) CCC requirements are represented by nodes and edges are drawn from a course node to a requirement node. This involved algorithmically generating the edge list to link nodes, which Google Fusion Tables did for me automatically. Although there was additional overhead to develop input data that is suitable to Gephi, it came with the added benefit to having directed edges and the ability for me to specify weight to each edge (which I chose to be the number of sections of a course that fills the requirement). I chose to run the Früchtermann Rheingold algorithm on my data because Force-Atlas created a large clump in the center due to the CCC nodes being heavily linked to other nodes.

gephiwithlabelsAfter partitioning the nodes by department, running the Modularity analysis on my graph, and reducing the edge size, I was able to create a very attractive visualization. The Früchtermann Rheingold algorithm placed courses that do not meet any CCC requirements around the outer edge and beautifully interweaved the remaining data in the center of the graph. The center consists of clusters of high-degree CCC nodes with large numbers of course nodes directed toward them. Due to the structure of my input data, the CCC nodes with the highest degree are also the nodes with largest betweenness sgephilabelszoomince every edge comes from a course node and ends at a CCC node, which results in the maximum path length being one. The eigenvector yields a similar result, as my input data is not complex enough to yield insight beyond in-degree and out-degree. In future iterations, I plan on modifying the input data to visualize these useful metrics.

After turning on labels for nodes, the aesthetic appeal is reduced dramatically because unlike Google Fusion Tables, Gephi does not selectively label nodes based on their size. Google Fusion Tables is able to dynamically resize labels and toggle their visibility based on your zoom level. Overall, although Gephi was able to create a more aesthetically pleasing visualization, I found that Google Fusion Tables made it easier to explore the data and its connections, especially with its filtering abilities. However, I believe I can create a more functional Gephi visualization with some modification to the input data so that I can better make use of the software’s advanced analysis tools. I would like findFusionECON ways to resize nodes and distinguish CCC nodes from course nodes, which I was able to implement in Google Fusion Tables. One downside I experienced with Google Fusion Tables was that it would automatically hide nodes it deemed to be insignificant, as a way to provide a more organized view of the data. Gephi offers the modularity to keep all nodes and reorganize them as needed.

I believe that this visualization meets some of Lima’s requirements for networks. This is a new, unique visualization since we typically see course data in a table view, so it creates the potential to generate new insights into the Bucknell curriculum. The graph clarifies our understanding of relationships between nodes by drawing the relevant edges, color coding nodes by department, and using network algorithms gephizoom2such as Früchtermann Rheingold to create organized clusters of nodes. These graph makes it easy for people to see the outliers around the outer edge as well as the high degree nodes in the center. Both types of nodes can have high levels of significance, so it helps that Gephi keeps all nodes in the visualization, even if they may not seem important. I did not find that this visualization greatly expanded my knowledge on the data, since it mostly provides similar information to what was already discovered in preview network graphs of the data. On the other hand, Gephi does a fantastic job of creating aesthetically pleasing visualizations that look like art. I’m excited to expand on my work and create more complex graphs that unlock additional insight to my data.