Assignment #6 – JZ

Our research topic is “How president Obama’s speeches changed over the time period from 2008 to 2012”, and we would like to do linguistic analysis on his speeches based on different dimensions including location, time, audience and topic.
Our dataset is created based on the speeches of Black Obama’s speeches from 2008 to 2012. We decided to choose this time period because a lot of events happened during these five years including economic crisis, the election of president, violence of Libya, and others. In the meanwhile, since we want to use Gephi to do the visualization of all the words that Obama used in speeches, five years’ amount of words may be the most suitable one. We first copied and pasted each speech in to individual txt files to create the corpus, and saved them separately based on the years. We also classified each txt file into groups based on locations to do the map, and based on audience to do word usage analysis.
Then we browsed each speech and concluded every single speech’s location, topic, audience, and year. The locations and years are easy to find, but the topics and audiences are not. We found the topics are broad and the audiences are hard to identify during the process of classify speeches. In case of not gathering useful information due to too many categories, we defined topics into groups of Economic, Social, Security, Political and Military. We also defines groups of audience as Public, Student, Military, and Politian.

The first analysis we decide to do is the word usage of the speeches with Voyant. We put each year’s corpus into Voyant, and made five word clouds to figure out what the key words are during the five years. Looking at the word clouds we found that there are some clues that show Obama’s speeches are closely relevant the social issues within each year. For example, in the word cloud of 2008 and 2009, we can see words like economy, work, and crisis that related to the economy crisis. In 2011, Obama’s speeches are more about security and rights because of the violence of Libya. In 2012, the election of president takes an important role, so we can see words like president, Romney, and governor.

We also did the word clouds of speeches classified by audience. What we found interesting is that Obama’s speeches that are made toward students are mostly about economy. We can see the words “economy” and “financial” appear a lot of times.


Therefore, we did the analysis of relationship between topic and audience using Google Fusion Table. In the visualization we did, Blue dots represent audience, while yellow dots represent topics. If we focus on the blue dot labeled Student, and the yellow dots that have been connected with it. The line that connects student and economic is much stronger than other lines. So we can further prove that when Obama is doing speeches facing students, he will discuss more about economic issues.


In the meanwhile, we count the number of vocabularies Obama used targeting different groups of audience, and found that he used more vocabs when talking to Politicians and students than talking to minority and military. However, there may exist bias in this conclusion because the truth is that there are more speeches toward the first two groups of people, which offers a larger base of vocab usage.

We also used Google Fusion Table to make the map of the locations that Obama made speeches during 2008 to 2012. We found that Obama had never made speeches in Africa and some East Asian countries including China. We found it interesting because it is a little surprise to us that Obama has avoided making speeches in these countries when global networking is much stronger nowadays.

Then we moved our concentration to the speeches made inside U.S. We concluded all the words in different states and use the corpus of words in Voyant to figure out the most frequent word Obama used in each state. We believed that the words that appeared the most times will best represent the relationships between topics and locations. We were inspired by the visualization in the Dubois show, and created the map that is similar to the one Dubois created. To avoid the overlapping of words, we did the states which have more speeches like D.C. first, and disregarded words like people which are the most frequent word in a lot of speeches. To better understand the relationship between location and topic, we used Google Fusion Table to create the chart. In the visualization, blue dots represent location, and yellow dots represent topics. When we looked at the two visualizations together, we found they can be consist with each other. For example, the key word in New York is Romney, and if we take a look at the yellow dot connect with Politic, many of them are cities in New York. That’s because a lot of speeches on the election of president happened in New York in 2012. Despite showing a little relationship between topics and locations, we see the map extremely interesting and attractive because through the words located in the map, we can see some beautiful stories. For example, through the word “Father” located in Indiana, we felt the happiness of president Obama being the father and making the speech as a father.

wxid_u97vx2nsf90a41_1450231403398_25 wxid_u97vx2nsf90a41_1450231451306_77
The last visualization we did is the one consist of all the single word. We used Gephi to make the round pattern. The more close to the center of the round, the more frequent the word Obama used. We also used different colors to indicate words used in different years. The degrees of thickness can represent how many words each year used. For example, the thinnest ring belongs to the year 2008 because we only found 4000 words in 2008, compare to the number of more than 60000 words in 2011, the ring of 2008 is hard to see. We also see an interesting element that there are several words has only been used once during five years, and they are located outside the main round pattern.

We created the website ( including the visualizations we did to show the audience a more organized and clear process of our research on the code-switching strategy that Obama used. Through this project, we think our visualizations somehow solve our research questions on Obama’s speeches. We better understand the using of vocabularies and the choices of topics made by President Obama based on locations and audiences.

Assignment 5 – JZ (collaborate w/ Yifu Qu)

The data set includes information of Nobel Peace Prize winners from 1994 to 2004. Each column in the file titled was titled “year”, “Name”, “Birth Place”, “Age at Prize Year”, “Profession” and “Image URL”.

Because we want to do comparisons among three visualization tools: Palladio, Google Fusion Table and Gephi, we focused on the relationship between genders and professions of these Nobel Peace Prize winners.

Last time using Palladio to analyze the Penn museums was not an enjoyable experience for me because the dataset was too large, so it ran very slow and messy. The file this time is much smaller than the last one, so I found Palladio work really well.

屏幕快照 2015-11-03 下午5.47.38

It is easy to choose gender and profession from the dropdowns, and we can find that more males are going in for political career; while there is a boarder diversity in professions of female. The best part of Palladio is the sizes of the dots. Larger dots represent larger amount of people, so it is obvious that more male win the prize, and organizations get the smallest part of the pie.

The Google Fusion Table also can do a nice visualization of the relationship between genders and professions.

屏幕快照 2015-11-03 下午5.41.00

I still really appreciate the colors using in Charts. As blue represents gender and yellow represents profession, it is easier to distinguish from these two colors than the black and grey in Palladio. However, the Google Fusion Table chart missed one important part – the organizations. I am a little confused why it was missed because even though organizations cannot be defined as male or female, it still has been labeled N/A in the raw file.

When we first time import the dataset into Gephi, we were surprised by how messy the data became.


We realized that Gephi will initially show all the information in the file, so we delete the column “Image URL”. But there is another problem that Gephi cannot recognize a person’s name. For example, “Yizhak Rabin” became two names in Gepi, so we changed all the spaces in names to “-“, like “Yizhak-Rabin” and it worked pretty well.


We changed the layout to “Yifan Hu” and ran the visualization. We got a visualization much like the one created using Palladio. The biggest difference is that the visualization created by Gephi list all the other information in the file including “Age at Prize Year”, and we cannot find a way to delete the extra information we are not planning to look at.


However, I found it really impressive that when I move the mouse to a point, the related information will immediately stand out since others become transparent. For example, as I move the mouse to the dot represents “Yasser Arafat”, I can easily found that he won the prize in 1994 and was a chairman at that time. Another element that I like Gephi is its Data Laboratory. It is so easy to add or clear a column while doing work. I think Palladio and Google Fusion Table may be better to deal with large datasets, while Gephi is really good to if we want to find relationships among more than two elements.

The process of dataset construction and creation of iterative visualizations addresses the most different advantages among similar visualizations. Even subtle details in visualizations will bring big influences on how the data information will be transferred to the audience, so cleverly choose a functional visualization tool is essential and meaningful.

Assignment 3 – JZ

I have chosen the dataset of Penn museums. Since the dataset is so big that it worked very slow with neither Palladio nor Google Fusion table, I use the part of the dataset about Europe museums in Penn. I feel that Palladio and Google Fusion table are both good at analyzing this kind of raw material, and can create useful visualizations. I planned to do researches on the relationships between time period, and cultures that museums focus on. When I was playing around with Palladio and Google Fusion table, I also found that the materials that museums used interests me a lot. I think that could be one of the attractions of data visualizations: they will continuously bring me new discoveries that I did not pay attention to before.


After I loaded the data into Palladio, I stated from finding the major time periods of these European museums. The Data tool in Palladio is really helpful when I chose to sort by frequency, I can immediately see how the museums’ building time are different. Then I can search “Neolithic” to see the smaller differences among the same period “Neolithic”.


When I changed to another element “Iconography”, and downloaded the list of icons European museums used, that really catch my eyes and motivated me to do deeper researches about old European museums.


I also found that facet tool in Palladio was very interesting. It can better arrange the data in different ways to show the frequencies according to different elements.


While using the graph tool to create the relationship between time periods and culture can create a clear and organized network, it left me some confusion about how the lines related to each other.


Google Fusion Table

I think the Google Fusion Table is really cool because it makes the complex dataset really concise. When I was playing the card tool in the Google Fusion Table, I found it really useful and convenient for me to adjust the format of the card. I can choose how many cards showing at the same, and I can also delete the information that I think is useless.Compared to the Gallery tool in Palladio, although the Gallery is aesthetic, the Cards make it easier to analyze the materials.


Using the Summary tool in Google Fusion Table is really engaging. The different colors represent different elements that I choose. Like in the picture, Blue dots represents time periods and yellow dots represents culture. It becomes easier for me to find the key period “Modern” since I only need to look at the blue dots. Moreover, when I click the dots like “Modern”, the most related lines will be bold. I think it is helpful for me to find relationships between two elements through that function.wxid_u97vx2nsf90a41_1444179541303_89

It is hard to say which one is better between Palladio and Google Fusion Table. It is ever hard to say if I preferred Palladio to Google Fusion Table. The Palladio’s design is very beautiful, and when I was playing with it in class, the gallery with pictures caught my attention immediately. But the Google Fusion Table is so clear and it makes the process of researching and analyzing data an enjoyment. The visualizations that Google Fusion Table create can actually explain the existing data and encourages me to dig deeper into it.