Assignment 2

Jinbo Wang


Prof. Faull

Homework 2


The corpus I chose for this assignment is part of my corpus for final project. For this assignment, I collected  public speeches made by Barack Obama during 2008.  All speeches are found at American Rhetoric. The website claims that  most of materials are for educational purpose, which I think is free to use.  The corpus can be classified in numerous ways, such as chronicle  and geographical, and also  can be further  subdivided  under each mechanism.  Each text in corpus contains 2000 word in average and there are 17 of them in total. Most of them are addressing topics of  presidential election and public affairs in  the US.

The result I got from Voyant is pretty straightforward and easy understanding:download

Voyant first create a word cloud, in which   words that appear more frequently in the source text are more prominent. Although personally I think word cloud is mostly for aesthetic purpose, viewers can still grasp some core value out of corpus.  For example, in this word cloud, words such “new”, “hope” and “change” are relatively dominant,  which meets the political agenda of Barack Obama. Besides the word cloud, Voyant also provides other powerful functionality. It also provides various linguistic statistics of corpus, such as  relative word frequency, vocabulary density and etc. Also I found some patterns  can be drawn easily from statistics but hard to understand at the first glance.  Following is a part of the linguistic statistics collected from my corpus:

Screen Shot 2015-09-23 at 11.06.13 AM

We can easily draw the conclusion that shorter documents used to have richer vocabulary density. Personally I do not think there is an connection within, but certainly it is interesting to investigate.

Comparing with Voyant, Jigsaw put most attention on investigating relationships of entities, such as location and people, appeared among corpus. Due to the nature of Jigsaw, which is designed for intelligence analysis, it is relatively more subjective compared with Voyant. It adopts machine learning technologies to help people understand relations  hiding  beneath the facts,  which provides  more  complicated analysis than Voyant. For my corpus, it helps me to realize  attitudes that Barack Obama towards different public affairs. For example, this is the sentiment analysis of 17 texts from my corpus :

Screen Shot 2015-09-23 at 11.27.57 AM

In this visualization, red suggests negative attitude and blue suggest positive attitude. For the left most text, Obama was talking about climate change. Jigsaw decides it is negative due to the appearance of words such as “combate”, “confront” , which indicate the seriousness of the speech evidently. In the contrary, the rightmost one is the “yes we can”speech,  undoubtedly the most excited speech that Obama has given.

Due to the nature of Jigsaw, it helps me to study Obama’s speech from demographic  and geographic aspects.It is possible that when speaking to different group of people, Barack Obama may use different set of vocabulary, and it is interesting to see if it is just an coincidence or  he really means to it.   From the lab instruction written by Prof, Faull, Jigsaw also has other promising functionalities that reveals the inner relationship and meaning of each entities, the version I have keep crushing on some of the functionalities when I am testing it, which is a pity that I can not  fully explore the functionality of Jigsaw now.

In general, I think both Voyant and Jigsaw can be put into good uses for my corpus. Due to the nature of my corpus, I am interested in investigating both the language use of Obama and also his attitude towards public affairs. For Voyant, it can help me to perform lexical analysis. It reveals vocabulary usage from various way, which can help me his language use in different scenario.   Jigsaw, on the other hand, provides deeper analysis of entities, which help me better understand the corpus in another way.

For both of the platform, they are like puzzles, only providing limited clue at a time, acquiring user to draw conclusions out of  it. Certainly for human brains, we can not process massive amount of data at the same time. Such platforms take advantage of both computing power and visualization ability of computer, revealing things we usually ignored. They are especially powerful to  differentiate subtle connections and hidden relationships among huge amount  texts, digging up things that are impossible to find by human.  Sometimes we might find some ambiguous and plausible conclusion by our self, but for most of time  we can not find enough clue to back our rationale. Both of the platforms help us combine subtle clues into more obvious, stronger evidence to validate potential conclusions.


Some random thought on Dubois artwork

Personally I have been exposed to “art” for a long time, and certainly I can not really judge his work from aesthetic aspect, but what he is trying to express in pretty clear: the power of data, facilitating with nowadays  unprecedented computing power. We certainly produce a lot more data than before, and it help us to understand the world we live in from another way. However, due the difference among each people, there can be different interpretation of one piece of data. People who create them can dress any kind of conclusion from certain data, or simply putting them in there and let viewer decide the meaning of it.

So, not all them are supposed to be meaningful and objective. Certainly yes, we do can get useful information from tons of data by studying its patterns, but not all patterns are useful. For example, one of his work is connecting all his email contacts together by dot and lines. To me, the network drawn has no more meaning than “I know a lot of people”. It does not make any sense statically and mathematically but simply a visualization of some random data set.

However, I found his work of collecting each presidents word use is pretty interesting.  Similar as the word cloud we have seen in the class, he first  developed his own algorithm to get his own data, and then align them in a straightforward  way, a simplified version of word cloud. We can easily grasp what was happened during certain time and how the United States developed till nowadays.


I got the first one under “Art” category:


Since it is not supposed to make any sense statistically, it is more for expressing the complexity and interoperability nature of  web science. Being as a developer myself, I can totally understand what this visualization is trying to tell us. It is a static visualization, in which there are tree like structure in  different colors representing different branch of web development technologies supposedly, such as front end development, web services, database system and etc. Each of them contains nodes and branches, in which presenting difference issues and sub-topics under certain greater topics. Also, these “trees” are intermingled, sharing same nodes, which indicate the fact that most of subjects in web science can not be studied or implemented standalone and acquire interdisciplinary thinking . In Chapter 2 of Lima, he mentions this kind of tree structure has “a significant influence in postmodern thinking, particularly in areas like communication theory, cyberspace theory, complex systems…”(44 Lima). I see this visualization as a perfect proving of his theory, as modern technologies become more subdivided  and simply more complicated  than any before.


Since I am also a fan of music of any kinds, I also found this one really interesting.

Here is the website for the origin visualization:

This one is called ” How music travel”. It is a dynamic visualization of the evolution of western dance music. Personally I think what make this visualization powerful and straightforward is the dynamic feature. From the screenshot we realize that there are tons of genres of music crossing different continents, and we can not really get any useful conclusion out of  this static screenshot. Due to the nature of the data, the evolution of music, which is developing  chronologically, dynamic visualization can be put in a good use here by only exposing certain part of data,which arranged in the order of time, to user at a time. Like Sinclair mentioned in his article, “Interactive visualizations, on the other hand, aim to explore available information, often as part of a process that is both sequential and iterative.” This visualization is undoubtedly one of the best presentations of how different visualization should be used properly in order to maximize the understanding  of the certain data.