Assignment 2: Myth, Reason, Faith

voyant creation

My corpus, a compilation of class notes from the Comparative Humanities core course “Myth, Reason, Faith,” was inspired by the creative process of essay topic generation.  My goal is to visualize theme patterns across texts in order to make potential essay topics more easily noticeable.  I organized notes into separate documents for each text.  For the purpose of essay writing, the “trends” feature of Voyant is useful for tracking terms over time, as long as the user sorts the documents by age beforehand.  I am interested in the “creation” trend because most of its occurrences take place in the earliest texts of the course – with the exception of the Aeneid which, modeling itself after Homeric texts, is the creation epic of Rome.  Although the Epistles also mention of “creation,” the word is much more common in the pagan creation texts.  There must be some reason why the monotheistic authors did not feel the need to write in depth about creation.  Were some notions of creation passed down from paganism to monotheism?  In general, using my corpus in Voyant seems more useful for studying than essay writing.  The word cloud and the corpus summary would be especially hellpful when studying for the final oral exams.

voyant word cloud

The word cloud displays all of the course’s most common words, which could be a good preview of the questions that might be asked on the final.  The “distinctive words” in the corpus summary offer a quick refresher for every text.  Not every text is summarized in a useful way, The Iliad is reduced to “achilles (6), patroclus (5), briseis (3), tears (2), objects (2);” but for someone who is already familiar with the epic, it gives me enough insight to jog my memory.

jigsaw list sacrifice

The visualizations Jigsaw offers are more directly useful for the purpose of essay inspiration.  If I wanted to write an essay about sacrifice, I could click on the concept in the list view and it would find all the texts in which “sacrifice” is mentioned.  Jigsaw’s graph and circular graph views are most elegant for comparing two concepts.

jigsaw graph nostos,nature

For example, I considered the concepts “Nostos” (homecoming) and “nature.”  Both concepts are mentioned in The Oresteia and Daniel, and the topics of “God,” “Greece,” and “power” also share common ground in the two texts.  The ability to view such connections all at once makes it simpler to discern which essay topics are more viable than others.  The process of coming up with humanities essay topics is the same with or without Voyant and Jigsaw, but the visualization platforms offer multidimensional viewpoints that expose every possible connection, not just the tired connections (i.e. the “simplified and immutable truths”) every student learns about from professors’ canonized lectures and chooses to write about based on familiarity.  The process of creating this corpus and choosing the entities with which to analyze it also helped me think about “Myth, Reason, Faith” critically.  Voyant and Jigsaw do not entirely remove the human element of creativity within the humanities because the humanist must still choose which concepts to analyze; but the platforms might function as an effective essay outline.

Assignment 2- Yifu

A Analysis of News and Chinese Stock Market

  • Construction of corpus

The topic that I was focusing on is the Chinese stock market. I would like to see if there is any possible and clear connection with the news report on Chinese stock market and its turbulence (bull(good and rising) market and bear(bad and falling) market). I chose two critical time period in Chinese stock market: the biggest bull market in 2007, and the recent bear market in 2015. Around 30 news were fetched manually for each of the time period from 5 major english news portals: BBC, USNEWS, Chinadaily, Reuters and NY times. These news are very relative to the topic because I chose them subjectively and I picked a little earlier in the timeline as I was wondering if the news report foresee the upcoming rise or fall in the stock market.

  • In Voyant

bear1111 bull das

These two are the word frequency picture of the bull(top) and bear(bottom) market. After I clear some of the misleading words like “china” and “stock”, these two pics look reasonable enough. People need to dig into a very subtle level, ignoring all the stock terms like “percent” and “index”. According to the left picture, we could see there are “selling” “fell” “brokerages” and “lost” shown up quite a few times. But on the right one one of these appears. Instead, “large”,”development” and “growth” seems to be appearing in many news report.

bull fallbear fll

Again, we could see the words connection in Voyant. In the top left one (bull market), rise has 17 connections while fell has 16. While in the right one rise has only 3 connection but fell has 21. And in the bear market one the word “rise” sometimes are related to “crisis”.

  • In Jigsaw

It is almost the same scene in Jigsaw but Jigsaw focus more on the entities rather than certain word. So in my research of the connection between news and stock market, Jigsaw is somehow less useful than Voyant. But one thing I found very interesting is the sentimental analysis.

bear niu

These two bars represent the sentimental value of each text. The more blue or right, the more sadness or bad words in the text. The more red or to the left, the more happy words in the text. Without even a guess, people could clearly see that the top one refers to the bull market while the bottom one referring to the bear market.

This sentimental analysis is crucial because when people want to know how the stock market acts, the most convenient resource for them is news report. And according to the sentimental analysis, they could foresee the upcoming turbulence of jump in the market, hopefully.

  • Comparison

I would say the these two platforms are all very useful but the work in different ways. Voyant seems to be more compatible with every kinds of text, no mater long or short because Voyand is taking every text together to analysis (comparing to Jigsaw). But Jigsaw is more picky in texts. You have to give Jigsaw a lot different text so that it could do the entity identifying. I’d say that for some certain area of research, Jigsaw would be more appropriate. I use Voyant more in my whole research.

  • Conclusion

As for Tenya’s argument, I don’t know if there is a right answer but I do think that by doing text analysis people would see a lot of things and connections that they would not know if they just see the text in total. “these cameras and the resulting images” did provide me with very multidimensional and interesting aspects of the resources. I would like to work on it more with my major topic if I have chance in the future. I believe it would be very though provoking to link math with statistical texts.


Assignment 2 – RPG Video Games, and Real World Influence

Project Inspiration

As a huge fan of video games, it is not surprising that when I heard Jiayu’s idea of visualizing games in our coming digital humanity project, I was super excited and decided to work with him immediately. As comes to details, we found it interesting to do an analysis about the relationships of video games’ elements/themes/factors, and how those may change over time. As we’ve amazed by the vast of number of released games from 1975, we decided to narrow down our data source to RPGs(Role-playing games) only; as most RPG’s plot count as an important factor, it should be easier to exact their elements/factors/themes simply by extracting their wikipedia page’s summary.

Corpus Construction

As aforementioned, we’ve decided to use the summary of wikipedia pages as our text source. As there more than 4000 recorded RPGs from 1975 on wikipedia, it is not suprising that a software package is needed to help us get the texts (It must be a pain to copy and paste all the summary pages by hand). So we chose the wikipedia package for python to do the work. Then, we constructed a list of all the game names in a excel file, let the program to read the file, use the wikipedia package to fetch the summary page, and write the summary paragraph into a file. But there starts a problem: based on the implementation of Jigsaw, a terrible result would be generated based on a very large file, and a slow processing time happens often; for Voyant, if we have too many pieces of texts, the generated trend line would be very difficult to read and interpret. So we finally decided to slice the texts so that for voyant, we have around 30 files and for Jigsaw, we would have around 2000 files to help it understand our corpus well. But when we look into individual files carefully, a problem is detected: based on the limitation of our and the wikipedia package‘s  algorithm, we got many junk informations. Therefore, we decided to use a online algorithm called DedupeFS to eliminate junk information.

Voyant analysis(Jiayu Huang)

Screen Shot 2015-09-23 13:32:14 +0000

The result of the relationship analysis is pretty amazing, for me. Based on a fact that a vast majority of RPGs chose a fantasy approach of game themes, terms like “dragon”, “demon”, “monster” appears frequently in all corpus, which is not surprising because that’s often the clue or the ultimate goal of a fantasy RPG game: to kill the biggest enemy that threatening the world. What is surprising is that, the “good” people are not mentioned often; the word “hero”, “angel”, “warrior”, etc. doesn’t appears that often, even if the helps from them might be essential for completing quests. And “quests”. It is not surprising to list it in the middle of the figure as almost no essential elements of the RPGs are not connecting with it, and it is the bridge to connect all the essential elements with each other.

In addition, some interesting facts:

  1. The princess is connected with the knight only by a word: book. So even fantasies know that knight and princess lived in a a fantasy world.
  2. Boys are connected with the word “named” while heroine connects to the word “unnamed”.
  3. Angel, although mentioned not that unfrequently, has a very weak relationship with other essential factors, and it has done it only with the word dungeon.
  4. Also, might be not that interesting, women characteristics are not occurred frequently. only the word “princess” and the word “heroine” could be a reflection of female characters, and, unfortunately, they are not connected to the main frame with a very strong relationship.

TimeLine analysis

Screen Shot 2015-09-23 13:55:21 +0000

Screen Shot 2015-09-23 13:46:51 +0000

Tactical, action, and strategy are three main types of PRGs, and a very interesting pattern is demonstrated: When tactical and action are in dominant, strategy games don’t; while strategy games becomes popular, the other two games falls unpopular. This is due to the wider acceptance of the video games overtime: generally, the action and tactical games are usually considered as “hard core” games, which usually difficult to play and only consumed by hard gamers. As video games are more and more accepted, easy-to-learn games like strategy games become more and more popular in the industry.

Screen Shot 2015-09-23 13:49:44 +0000

And more interestingly, it seems that gaming people don’t like love; instead, they like wars. Of course it is because most RPGs’ plots are based on wars. But surprisingly love is not usually mentioned until a certain time. After that, love is usually follows the same trend of the war in frequencies over time, which means it usually comes with the war.

Jigsaw Analysis(Zhengri Fan):

Comparing to Voyant, Jigsaw is not that useful overall; especially in our circumstances. Jigsaw is in advantages on identifying entities, which is to group nouns in a group that almost everybody shares a common feature in that group. But for our game analysis, only certain nouns are mentioned overtime. For the most part, different games have different terms and settings, therefore made it hard to identify entities based on our corpus. Moreover, for word relationships, it lacks the ability to put multiple words on a single figure to show their direct relationships, while in Voyant, it is easier to use and show a clearer result of the relationships between different keywords.

But we could still generate some usefulness from Jigsaw.

For example


This is a list view in Jigsaw, listing organizations and game names in order of word frequency. It is not surprising that the word “Playstation 2” is the highest among all organizations as it is one of the most popular game console in game history. Furthermore, the word “Final Fantasy” is mentioned the most among all the names of the games, as it is the world’s most influential PRG.


Another useful feature of Jigsaw is the word tree feature.



Based on the pervious Voyant analysis, we could see that female characters are not mentioned frequently in RPGs, and such can also be justified by the word tree feature in Jigsaw. As we see, there are way many words connects to “boy” than connects to “princess”, which could suggest that princess are not often mentioned in RPGs, at least not a important character in the game.

As mentioned, since Jigsaw groups nouns, it is useful for our project in a way that it could help us to identify the relationship of the games with different people and organization. But since that is not the main concern of our project, we chose not to talk it in detail. For the tool Voyant, as it is a more plain statistical tool than Jigsaw, it could help us more in gerneating a certain aspects of the corpus created, in many more ways than Jigsaw.

Reflection of Tanya’s Reading

Our corpus are no more than plain text. By applying Voyant and Jigsaw to it, we could see that those cold-blooded words become more meaningful reflections of the real world, of the Humanity. Voyant and Jigsaw are very different tools, by allpying both of them, we could analysis our raw data in different aspects. And by knowing the data in different aspects, we learned the subject in a detailed and profound way: we are not only focus on one sepcific plane in the world, but are seeing the whole issue in a three-dimentional object established by the tools; we are do not only seeing the cold-blooded data, but also seeing people’s view points on them. And, at last, we connected the digital tools with humanity.