Assignment 5

I collected my data from the Comparative Humanities core courses’ syllabuses.  The three core courses include: HUMN 128 – Myth Reason Faith (18th Century BCE-1295), HUMN 150 – Art Nature Knowledge (1486-1859), and HUMN 250 – Nihilism Modernism Uncertainty (1882-1957).  Together, these courses are advertised as the history of human thought.  They cover works starting in the 18th Century BCE with the Enuma Elish and ending in 1952 with Ralph Ellison’s Invisible Man.  I listed the title of each work we studied in each of the courses, along with its author, date of publication, coordinates, author’s sex, course it was taught in, and author’s ethnicity.  As curricula aiming to cover such a large time scale, it is impossible to include all notable humanities works in every genre.  Visualizations of the course data can draw attention to the areas that may have become invisible in the process of simplification.

palladio map sized

Using the Palladio mapping feature, I plotted the location of publication/creation of each work we study in the humanities core curriculum.  The highest concentration of works is in London and Paris, with Europe in general heavily represented.  The curriculum does primarily cover Western thought, so this pattern is unsurprising.  However, South America, Africa, and Southeast Asia are entirely unrepresented.

palladio graph course ethnicity     gephi author-ethnicity

The gaps in coverage of author nationality/ethnicity in Palladio’s graph function and in Gephi is less immediately obvious than in the Palladio map.  The multitude of nodes gives the false impression of ethnic diversity.  The array of colored nodes in Gephi make it seem like a broad range of ethnic groups are represented in the courses – the network visualizations only show the groups that are represented, not the ones that are invisible.  However, the Palladio graph can compare the relative diversity of one course to another.

palladio graph course genre     gephi title-genre labeled

In both Palladio and Gephi I visualized the genre of each work, with the Palladio graph additionally dividing the genres based on the course they were covered in.  In the Gephi network, the dominance of philosophy and literature is obvious due to the color coding.  The Palladio network is more useful for showing the genres studied in each individual course rather than the total popularity of a single genre across the courses.

palladio graph author sex       gephi author-gender

I separated the authors into categories based on their sex, revealing the obvious and enormous disparity between the number of male and female authors.  Of the eight female authors we discuss in the humanities core courses, five of them (Wollstonecraft, Shelley, Woolf, de Beauvoir, Kaplan) are almost exclusively analyzed within the context of feminism.  Zero of the authors are considered non-binary.  Since the force directed graph in Gephi shrinks the distance between “male” nodes, the gender gap is more visible in Palladio.


gephi title-genre labeled fr modularity      gephi title-genre

The above visualizations of the different genres represented in humanities core syllabuses, both made in Gephi, are examples of different syntax using the same data.  In the Fruchterman-Reingold, radial implosion, the most popular genres do not stand out as obviously as in the force-directed, centralized burst.  The centralized burst, concentrates the most relevant nodes at the center of the visualization, drawing the viewer’s eyes more quickly to the differences between nodes.  Fruchterman-Reingold viewers must search for node color among the evenly-spaced nodes.  One, well-connected node is immediately noticeable, but most of the rest are lost in the sphere.


In both platforms I was unsure of how to visualize all of my data at once.  I can’t color code individual nodes in Palladio as I’d like to, and Gephi gets confused with too many variables.

Character Relational Analysis of The World of Warcraft Novel (Collaborate w./ Jiayu Huang)

For this assignment, we decided to come back to RPG games. As for network visualizations, one plausible way of constructing the networks would be the relationships between characters. Therefore, we chose the novels of the World of Warcraft(WoW), as the source to perform visualizations, since it could draw more detail than our memory of the game’s plot. We finally chose one of the famous series of the official novel for the game, namely War of the Ancients Trilogy, as the source of our analysis.


Data Construction

Before we could do anything analytical, we need to construct the base data for our visualization. Since we decided to analyse the relationships of the characters of the novel, the very first datasheet would be simple: if two characters have a connection in the novel, we would have a line with both names separated by comma. If the characters have multiple connections, we would repeat the lines multiple times. In this assignment, since our raw data is from raw text, we used Jigsaw for help. The method is simple: import the texts and a custom entity of character names, and let Jigsaw to analyse their connections in the text. Then, with a list view, we could examine the who and how often does two character connects, and store the information into a spreadsheet in which the first two columns are the names of the connection, and the third column is the frequency of that connection. (Figure 1) Then, I wrote a simple script to repeat the lines n times, where n comes from the third column of the pervious spreadsheet (Figure 2). Also, with the help if the WowWiki, we have make a simple identity list of each characters in which it contains his/her/its race, gender(if plausible), and the affiliation in the novel.

CaptureFigure 1

Capture12Figure 2

Analysis in Gephi

First complaint: it is beyond my imagination that Gephi does not support spaces between input texts. Therefore, before we could do any analysis, what we have to do is to replace all spaces with “_”, and it could be done in multiple ways. Anyways, after we have imported all necessary data, with some modifications, the initial graph looks like the following:




















This initial visualization is useful, but hides too much information. Although we could draw some conclusions from it, it could not show any result clearly, so we process the visualization further. One important step is that we have to connect the identity list with the connection list, since otherwise we have no useful meaning other than a beautiful network graph, and have the ability to compare with our pervious RPG analysis. With the ability of performing natural join, we could easily combine the two sets of our data and to reveal the relationship of race, gender and affiliation.

Gender analysis


Capture5Gender problem, specifically, the problem of the domination of the male characters, also exists. In the right figure, the male nodes are colored by green, while the female nodes are colored as red. For the unknown gender (either a non-human character or a character that with almost no gender information), is colored in blue. As we can see here, the green dominates the screen, in which it reveals that fact that there are only few female characters mentioned in the text. Furthermore, among the females, only two red nodes have a significant connections with other nodes in the graph, and the number of their connections, for the one female character that is in present of the central nodes), are comparably weaker than those green ones. If we tries to group the data (left figure), magic happens: we have our predictable giant green dot, and a small red tringle lies lonely on the 2 o’clock direction. Oh, and the dust-like unknown genders lies on the 4 o’clock directions, FYI.

From the previous visuals, we could see that one general problem with the RPGs, the male-domination problem, still exists in the WoW series.


Affiliation Analysis

Capture2Then, we choose to analyse the affiliation of each characters: they could either be red, which means they are for the common good of people; black, which means that evils lives in their deep heart, and the green colors which means that they barely picked a side from either the good or the bad. Contradicting to our pervious analysis that the villains tends to perform a influential role in those games, we could see that green nodes are barely connects to the black ones; and there exist only a few connections between the villains, while strong connections is present between the good and the bad. Thus, we could make some conclusions about the plots: those natural characters, are acting more like a background characters: since they have little to do with the villains, it seems that they aren’t really involved in the conflict between the reds and the blakcs; instead, they might be the common teacher of someone, or those be loved by the side characters, furthermore, they could be a poor victim of the villains, in which they haven’t got a change to pick a side. As we see the strong relationships between the reds and the blacks, we could also conclude that the conflict between them is a big one: and it is, because they are at war (as the title suggests). Therefore, we could draw conclusions that the bad people, in the story, is a very clichéd characters that they are totally evil or BLACK, and it is important to think of them having some bright points.


Race Analysis

Capture3There are plenty of races involved in the story. In the figure, That the relationship strength reveals the node’s level/status in the story and the size of the node reveals the node’s level of loneliness. As we could see, the most mentioned race is the night elf, and the most influential one (connects to the most races), is the red wyrm. There have been barely any mentions of dragons in the WoW series except the King/Queen of the species is mentioned, which explains why those nodes are small, although it connections almost around all species. As a background, the red wyrm mentioned in the text is one of the main characters.








Gephi analysis with graph theory

Capture7Gephi is far more than a data visualizer. It can generate statistical information such as distribution, shape, and the density. If we want to know how well the characters are connected with each other, we can let Gephi to generate some numbers for analysis. The right figure is the statistical data we got from Gephi, like the average degree, the network diameter, the graph density, and the average path length is provided. Of course, the average path length is much smaller than the one for the real world: 6, since the story is told in one particular character’s view point, which results that all connections are closer than it should be.








Tools comparison:


Gephi, which is similar to the two previous tools, Palladio and Google Fusion table, is a network visualization tool. Therefore, it makes sense to compare them. The left graph is the result of the same data set visualized by Palladio, and the right one if that of the Google Fusion table. First thing to mention, since huge updates have been done for Palladio, it seems that the responsiveness of it is much faster than before. (I noticed!) As a comparison, Google Fusion table and Palladio are more like subsets of Gephi, in that the most features that the former two support are also supportsed by Gephi, while there are some features missing in Palladio or Fusion table which Gephi has. For example, it is difficult to do data management in the Fusion table or Palladio, especially for natural joining, while in Gephi, data management is a piece of cake. Also, the visualizing tools in Palladio/Fusion Table only have Force Atlas layout, while there are multiple layouts present in Gephi. And most importantly, they cannot generate numbers based on graph theory, while Gephi can do that easily. The only advantage is that both Fusion Table/Palladio could link actual maps with the nodes.



Relational Graph Analysis on Characters in World of Warcraft (Collaborate w/ Zhengri Fan)

Data Preparation (Zhengri)

Network Visualisations start with questoins.(Lima) So we start this project from our previous visualisation on RPG game’s topic. Our question is “Is our previous observation true in actual cases?” Therefore, we take the official novel (official story text) of World of Warcraft to help us answer this question. The book is named as War of the Ancients Trilogy. We create two datasets from the text: 1. the relationship between characters 2. the character’s identity. The first step is to find characters from the text, so we use the tool jigsaw to extract person name from the text. Then we use algorithms to build a relationship table. For the character identity table, we build a data scheme of Gender, Name, Affiliation and Race. We do this because our question is to explore our previous prediction’s influence in an actual case. Our previous prediction is mainly on gender of characters and the affiliation of characters in RPG games. I won’t talk about the data preparation in detail because it is mainly my colleagues work. If you would like to explore more about this pls visit his blog post. The data preparation should count as very important in our project because it is the most important basis.

 Data Analysis w/ Gephi (Jiayu)

I got the list of character’s relationship from Albert(my teammate) then I start to use gephi to visualize the graph data. Though Gephi has been updated since my first use last year, It can not support space between text. Therefore, before inputing the name data into Gephi to create relational graph, I eliminate space first usingCapture the code =SUBSTITUDE(row, ” “, “_”) in excel (I mention this because it might be really heapful for the future gephi user.)  After I input my data into Gephi, the output is like the graph on the left. (Well, it is not quite exactly the same, but the “DEGREE OF MEANINGLESS” matches.) It looks pretty but it reveals nothing. Though we can make the strong relationships look more significant and the influential nodes’ color darker, it tells nothing. Next step following our relational creation is to identify the identity on each node. Therefore, we combine the information in our identity list and our relational graph. each node in our dataset involved with its gender, affiliation and race. We choose those attributes to build our node identity scheme because we would like to continue our previous project on RPG games, which reveals the gender facts and affiliation fact. Gephi’s data managment feature works very well because it is able to do a natural join on two data set (It combine relations with the node’s attributes even if they are two separate dataset, the key(id) we use to do that is the name). So our futher analysis on this WOW character data is on 1. the gender 2. the affiliation 3. the race

Gender Analysis on WOW Data:


On the right is a character relation graph with partition coloring based on gender. The green color stands for male and red color stands for female. The blue one (yes, they do exist) is those in unknown gender (animals or just unknown type). The pattern is, well, very straight forward yet predictable. Male is dominating RPG game and story. Though there are still some red points with strong connections, It won’t change the fact that we don’t really need a female figure. Even if it is the most famous and legacy RPG game World of Warcraft. This pattern can be more Capture5shocking if we do a group to the data. (the graph on the right) The big green dot is of course the male character group and the poor unknown group are the small dot on the 4 clock direction (if you can not notice it at the first glance haha. you may read the post on the high definition webpage to find it.). The exploration on WOW proves our previous observation very well.



Affiliation Analysis on Gephi:

Another observation on our previous data analysis on RPG games are that the affiliation of bad guys, the villains, act as more important roles in RPG games. However, the graph tends to tell a different story for World of Warcraft. On the left is a visualization of the affiliation grouped graph. Red Color is the good guys and blackCapture2 tends to be the villain and green color is those characters that on the neutral side. Connections between those red points totally shut down our previous prediction on the affiliation influence towards RPG game. When we explore more on the most significant points (the big green(neutral side)), we find something interesting that may explain the mis-prediction. The neutral side characters, even if they tends to act neutral, didn’t have much connections with the villains. Then, the connections between villains and heroes are always very strong (the widest red line). And there is only few connections between different villains. So we can find a story behind this affiliation relationship. That because neutral characters have no connections with villains, we can say that they are more like “background NPC” instead of core characters. They don’t actually involve in the conflict and they are mentioned because the protagonist meet them. Then because the connections between villains are really weak, we can conclude they are truly THE VILLAIN. They are very strong and they are strong enough to conflict with the heroes without much cooperation. Then the strong connection between heroes and villain might lead to the fact on massive conflict or main story line. Then we can conclude, villains in our story are not the first glance in the graph, not important. They are just depicted as lonely villains. The strong connections between villains and heroes proves their importances. However, I would like to say that the villain’s figure is so cliche in World of Warcraft. It is just a very traditional Byronic Hero.

Race Analysis on Gephi:

The third analysis on Gephi we built is the partition based on race. I would like to prove my previous analysis on affiliation with my race analysis. That the relationship strength reveals the node’s level/status in the story and the size of the node reveCapture3als the node’s level of loneliness. This graph shows that in WOW novel, the race with most characters is night elf the big purple node on the graph but it is not the most important one. In fact, the red_wyrm (red dragon) puts influence the most in the story. There is only one red dragon in WOW’s world, so the size of the node is really small. However, the edges of it tend to be giant across the world. In actual story, red dragon is truly the most influential character btw. So It proves my previous assumption on how to read the graph.

Graph Theory Analysis on Gephi:

Not only infographic analysis, gephi can also do some very interesting data anCapture7alysis from the graph created. When we talk about graph theory, we are trying to use graph theory to find solution for some statistical consideration on the data. We would like to know about the distribution, shape, and the density. I.e. we would like to know how the characters are connected. Are they connected really tight or not. On the left is the theoretic graph analysis from Gephi. Average Degree is the average influence per character and graph density is how they are connected. From those number the most interesting number is the average path length. From the 6 degree theory, we can predict that in real world social network, the avg. path length ~ 6. However, in WOW, the average path length is 1.32. That means you, as a nobody in that world may connect with our villain in a degree of 2 step. It is really a tight relationship in RPG. In other world we can say, the social status and social barrier is very thin in RPG game’s world.


Compare to Our previous tool: Google Fusion Table and Palladio (Zhengri):


In this section, I would like to say conclusion first: Gephi is much more sophisticated than Google Fusion Table and Palladio because those 2’s feature are only subset of Gephi’s. (Palladio updates from 1.01 to 1.13 and It’s performance is very good now). Google Fusion table and Palladio is easy to use compared to gephi but the thing is things can be done by Google fusion table and palladio can also be done in Gephi but Google fusion table and palladio can not do what Gephi can do. The first feature that they are not able to do is the data managment feature provided by Gephi. Google fusion table and palladio can hardly do database operation like theta join or natural join to the dataset. so that the data scheme can not be added to the data relation. Then, the graphic model of Google fusion table and palladio is insufficient. The only visualization model they can use is force atlas.  At last, it is hard for them to do deep data analysis based on graph theory. The comparison between Gephi and them just like the comparison between Photoshop and Windows paint tool. Though they look the same, they live in totally different categories. (Professional productivity tool vs. Temporory tool for fun). They can do simple visualisations but they are not able to do some deep analysis. Capture8Capture10I should admit that at least they looks very nice.