Final Project- Yifu

Our final project is called “Poker face” and it is a analytical research on Obama’s speeches between 2008-2012. Jiaming and Jinbo were both very helpful and thoughtful during the whole process of our final project.

  • The Start

Our first question that attracted us to this field of study was “How does a president speaks when in different situation?” We thought it would be very interesting and meaningful to look at the speeches given based on divergent outside situations. Later at the advice from Professor Faull, we decided to also put effort on the topics that has been talked about in the world by President Obama.

  • The Corpus

The corpus consists of almost 200 speeches from the five years from 2008 to 2012. The first thing that we wanted to know about these single speeches is the categories that each of them belongs to. According to the general question that we come up at the beginning, each speech is assigned a time, a place, a topic and an audience. Thus we could sort the 200 speeches into a .csv file and do research and analysis that we wanted to do. This is the most time-consuming process: it almost took our three two weeks to read through every speech and categorize each speech into the right place. There are some very clear figures that we could find as we eventually finished the file. Almost half of the speech took place in D.C. where the white house located. And the most frequent word used in speeches to students and commencement is “education” and the second most frequent ones are “job” and “economy”. These are quite interesting observations before we dig more into the data. What is the purpose that Obama give speeches to high school student and graduates? To use their educational experience to find good job and make the U.S. economy better, isn’t it?

After we finished the excel sheet, the next step to work on the set of data is to categorize the corpus one by one. We created several folders and sort these 200 speeches by time, places and audiences for further research. It is even more amazing that we did another spreadsheet for Gephi and it contains every single word that Obama has used in these five years(it is nearly 70,000 lines).

  • The First Step

We followed the sequence of the our course study in the usage of the software and platforms and we tried to see if everything would give us something useful or meaningful. So the first step is the powerful word analysis tool-Voyant. The best of Voyant is that the word cloud part is really clear and easily changeable. As our interests, we put the sorted text into Voyant. As the following picture shows, it is very clear that each one is different. In 2012 the word “Romney” made a huge part which refers to the election of the year. And when the texts were sorted by audiences, they show that Obama literally talks different things to different people. One could be able to guess which one refers to which audience group even if I erase the labels.

wxid_u97vx2nsf90a41_1450231018735_83 wxid_u97vx2nsf90a41_1450230979655_29

One more interesting finding is the total word usage.

wxid_u97vx2nsf90a41_1450231432557_99

According to the data above, Obama used more than double amount of the words when talking to Politicians than Military. It is reasonable to conclude that the writer behind Obama’s speeches must have thought about how to make the speech more useful. In fact, we all have the experience that it is necessary to say one thing in different ways when we are talking to different people. So we could see the speech strategy and construction really cares about whom the speech is given.

  • The Map

Another very important part of our project is the map. We got the inspiration of denoting one place with a certain word in the DuBois exhibition early this semester. We all think that this will be an attractive thing to the readers about our project. As the following pics show, each place or country that Obama has visited and gave speeches was assigned a word.

wxid_u97vx2nsf90a41_1450231403398_25QQ截图20151216031946

The word/world map is a real representative of the speeches geographically. The most useful way to interpret this map is the how the place/country is connected with American politics and economy. It is reasonable to see that words like “Nuclear” appears on Moscow and “Security” appears on Afghanistan. There are also “partnership” on Australia and “Democracy” on Brazil which are not listed on the map above.

During the process of making this map, I first ranked all the places and countries in a decreasing order according to the number of speeches took place so that I could decide that big/important words goes to the matching place. Then I put the texts sorted by places into Voyant and look for the most frequent word. Later when I do the same thing along the rank list, I avoided using repeated words so that the map looks more meaningful. I also disregarded the most common words like “American” and “People”.

  • Relationships

wxid_u97vx2nsf90a41_1450231451306_77

Other than words analysis, out interests also contains the possible relationship between his speeches. The visualization above is the relationship between places and topics in Google fusion table. The biggest D.C. is in the middle for sure and almost all of the topics (yellow nodes) are around. Also we could see that the Religious topic was only talked about twice in these five years. This visualization is interactive on our website. One can drag those points to see the relationship.

  • World of Word

wxid_u97vx2nsf90a41_1450231463633_76

This might be the most complex and beautiful visualization that we have done so far. In my own mind, Gephi is the strongest tool in data visualization but I have not make full use of all the functions. In this 10,000 nodes graph, it shows all the words that have been used in the five years. It looks like a tree ring and several rings can be seen. There are five rings in total if we do not consider those scattered points outside the graph, which refers to those words only appeared once. And each of the five rings represents words have been put in use in a certain year.

  • Conclusion

I like this topic very much and we put in much effort as a group in this final project. I think we could do much more things with the corpus available. The sentimental in Jigsaw and timeline in Palladio combined together would be a good try but we do not have enough time to do everything. Also the visualization in Gephi could definitely be analyzed with more details.

Assignment 5 (Yifu Qu w/ Jiaming Zhu)

Lately we have been digging into Gephi for more advanced technologies in data visualization. It seems that Gephi is definitely a more stronger tool than Palladio and Google Fusion Table but I cannot get the whole functions out of it in two weeks. I used my own constructed data set of the Noble Peace Prize Winners as I did in assignment 3.

123

This data set is quite small and my original .csv file has only 30 lines and 7 columns. I thought this could be so small for Gephi to run because all the data sets I have sampled in Gephi were huge things up to thousands of lines, making very beautiful and grand visualizations. But the first time I tried with my original data set in Gephi there were serious problems: Gephi takes every single word in every single box to be an independent node. So the outcome of my first trial is not very welcoming: names like Barack Obama was cut into two nodes “Barack” and “Obama”, although it shows there is a connection between these two nodes(it better does!). And all the URLs of the images became nodes with long and disgusting labels.

The first try is not quite successful but I did get a piece of how Gephi works. In Palladio and Google Fusion Table, one can only show the relation between two features (source to target). And that gives you the meaning on its appearance: you know what this picture is talking about at the first glance.  But in Gephi, it displays everything you imported in at the same time. It looks awful at first since all your features, age/gender/name/place, are mixed together in a disordered picture. What I did to get it work is to delete columns in my data. If I want to see how “gender” and “professions” are associated, which I had done in Palladio, I only keep these two columns in my .csv file. Then in Gephi there is a very clear graph showing all the connections between these two features. So what if I add one more columns? Then Gephi can do things that we could not achieve in Palladio and Google Fusion Table: it shows the connections of the three features. This is not a very complicated graph because we have 3 columns of data imported. But in order to remain the visualization meaningful, we cannot import file with too many columns. Here’s the visualization that I did in Gephi. Note that in Gephi we could choose a way of how these nodes connected to each other: Force Atlas, Yifan Hu, etc. But in Palladio or Google Fusion Table these things are all set ready for you. Also the modular function gives group and colors all the nodes by similarity and difference. I am not sure how exactly it works but I find, in the following picture, that the groups are divided by the degree or similarity in certain features of the nodes. Nodes connected with Male and Female are in two different colors apparently.

32

I used to think that Palladio is a very beautiful software comparing to Google Fusion Table, but I found Gephi is even stronger in producing good-looking results. The “Preview” page gives wonderful outcomes for the visualization. Gephi is definitely well-developed tool in data visualization but one need to prepare suitable data set to get a good outcome.

21

For my own data set, the outcomes in Palladio/Google Fusion Table and Gephi are quite similar. One thing different for Gephi is that I did 3 columns data in Gephi, which I cannot achievein the other two platforms. It might be the reason that my data set is too small to fing any significant difference between these tools. But overall I think Gephi is more stronger and I would like to use is for part of my final project since it really provides beautiful pictures.

Assignment-3

I chose to take the Nobel Peace Prize Winner (1994-2014) as the target of my research. I didn’t stick to the stock market topic because that one is barely geographical. I would like to find out the similarities and differences of Nobel Peace Prize Winners through this research. Here is my own constructed spreadsheet.

123

I have 8 columns and 30 rows (29 winners in 20 years). In columns I have “year, name, birth place, year of birth, gender, age when got the prize, profession and image URL” as my entries because those are things in which I try to find connections and relations in Palladio. I also constructed a place-coordinates table in order to work in the “map” in Palladio.

To be honest I love Palladio more than Google Fusion Table because I feel that Palladio is more user-friendly and antithetical. Google Fusion Table is definitely a stronger tool but I can’t get enough out of it with my data (maybe it works better for larger data).

Here are some visualizations form Palladio:

data

This one is the metadata? of my own constructed data. It is very clear to see that I have 29 identities in the spread sheet and each of them has been defined with certain value of “year, name, birth place, year of birth, gender, age when got the prize, profession and image URL”. It is quite confusing for me to work with the map part as you may see next. Palladio does not work with place names, instead it eats geographical coordinates and spits out points on the map. So I used google map to find each place a coordinate .The map looks ordinary and reasonable enough so I can probably guess that the Nobel Prize Committee does not have any preference of countries and regions for the prize.

地图

It is also quite useful to use the table view in Palladio. It helped me to categorize my dataset based on the criteria I chose. For example, the following screenshot represents my data categorized in years. You may find there are some years that have multiple winners. A little flaw is that I cannot put that in the chronological year order so that looks clumsy in the first column.

5

And the biggest thing in Palladio would be the connection part for my dataset. That’s where all these different information gets ordered and connected. I chose 3 meaningful visualization as examples: “Profession-Birth Place”, “Gender-Profession” and “Year-Profession”. One would find out that through these 3 visualizations there are lots of interesting hidden information. Almost all the winners have something to do with politics and there are much more males than females that have won the prize, etc.

2 1 3

These visualizations, in my mind, really provided me with a brand new perception of my data. Once the data is made into graphs or different arrangements of elements, certain meanings are attached to them and people could easily get them through the human nature. That is the biggest power of data visualization.