Final project reflection


Our team’s research question is to investigate if Obama actually is using code-switching technique in his speeches when talking to audiences belong to different classes, race and ethnicity groups. I can’t help but feel obligatory to share this Youtube video with my fellow readers. Although it is an exaggerated version of how code-switching technique is used, it can still be an excellent example demonstrating how it can be adopted in real life.


President Obama drew public and media’s attention at the very first day he became the president of the United States since 2008. He becomes an embodiment of black culture as he being the first African-American president of the United States. The definition of code switching originally indicates  frequent and instant switching between two or more distinct languages (Wikipedia). However, in our project, we tend bring a more generic and broad definition of code switching.Now it also indicates subtle and reflexive changes of the way people express themselves encountering different situations. The project first performs general linguistic analysis and then attempt to find traces and evidence of cases which code-switching was used in his speeches.


Our project assumes audiences have no sociology and linguistic backgrounds. All terminologies and abstract ideas that are needed will be explained in a way that is understandable by everyone. All visualization will be digital and we post our work on a website, which is accessible for everyone in anywhere from the world. The whole website is designed in a  storytelling fashion that audiences will follow the exactly steps we took to reach the conclusion that we had. We believe this is a more persuasive way to let people really understand ideologies behind our work and also a more interesting way to express our idea at the same time.


Most visualizations are combinations of both interactive and static view. Most visualizations in Voyant, Gephi and Google Fusion Table have interactive features and allow audiences to explore by themselves. We chose to first post static snapshot of visualizations from Gephi and Voyant to let audiences have a general understanding of visualizations. Audiences can further play with them by clicking links behind snapshots.

All data we used, which are mostly speeches of the president Obama, come from this website. We first process all speeches to get metadata. Our metadata consists of locations, audiences and topics and time of all speeches. We think this could help us to analyze speeches from different dimensions, which enable us to perform more comprehensive analysis from different angles.


The first analysis we performed is word frequency analysis. This is done by Voyant. We first group data into different groups, classified by time, audiences and topics in specific. I took off some words from word clouds in order to give more representative results. Words such as ‘i’, ‘they’ and ‘god’ exist almost in all of his speeches and they do not have special meanings under different scenarios. An example of visualizations from Voyant looks like this:


Voyant Visualization of 2012

This is the word cloud for all speeches in 2012. We can see that one of the most distinguished words from it is “romney”. It makes sense since it was during midterm election and Romney was the strongest opponent at that time. At the right side of the visualization, we can also find the word “tax”. This also can be representative since Obama was proposing multiple reformation on taxation, such as increment of tax on high-income taxpayers and lower tax for startup companies and small businesses.


This is another visualization from Voyant. This word cloud contains all words under category ‘Military’, which are speeches that president Obama gave to military personnel. It is pretty self-explanatory that the most distinguish words are ‘iraq’ and ‘security’.In general, Voyant standalone cannot give us any useful conclusions. This is due to the nature of corpus. Word clouds only display words by frequency. There is no necessary correlation between the importance of a certain word and how many times it appears in corpus. Words like’ I’ mentioned above are not helping us to grasp the essence of speeches. Also, most words in word clouds are nouns. It is hard to find his attitudes from nouns. Verbs and Adjectives are more useful in this case and Voyant is not good at selecting words by their function. However, it is still helpful in some degree. Both of these visualizations prove that fact that he did use different sets of vocabularies in different situations. This further suggests that he is likely to use a different set of vocabularies to handle different scenarios.


Voyant Visualization for millitary personal


The next series of visualizations analyze the relationship between topic and location. Although once again, it is not providing direct prove of code switching, it shows us the fact that locations sometimes are specifically selected by president Obama and his team for certain topics. This is one of the visualizations:


Keywod classified by states

This visualization displays keywords of his speeches grouped by states. This visualization gives us some interesting result. For example, in states like Mississippi, Alabama.Georgia and South Carolina, where has relatively higher percentages of African American than those in other states. We can see that keywords are words such as ‘Hope’,’Change’ and ‘Affect’, which are all positive and all share one similar idea. Considering these locations, I do not think this is just a coincidence. I think president Obama and his team realize there are distinguished percentages of African American residents. He knows these words are exact the words that can excite African Americans and make them support him. From this example, we can see that code switching technique both depend on location and topics. Different locations have other concentrations of population. Such concentrations can be dominated by race, ethnic groups, class and etc. Different topics, at most of the times, are targeting specific group of the population. Combining both location and topics, different styles of speech are expected in order to satisfy specific groups of people.


The last visualization is done by Gephi:


This visualization consists of all words from speeches during five years(08-12). In this visualization, we can see that it look like annual rings of trees. In the center, where has most nodes condensed it, it means these words are used most frequently crossing five years. The concentration in the center suggests that there is a core set of vocabulary that used by president Obama in most speeches. In the outer area, we can see there are rings with different colors overlapped with each other. These are words appears mostly in a certain year but are not distributed evenly across five years. It is known from previous visualization from Voyant that president Obama focused on different topics each year. These words are most likely addressing these issues in particular. This is direct evidence of code switching. Those unique words that are only used in specific location, time and facing specific audiences can be best exemplified how code-switching is adopted by president Obama. We are definitely going to further investigate and test different visualizations in Gephi if we get a chance to do so.

In general, I think now it is fair to say president Obama is adopting code switching. There are several reasons when people choose to code switch, whether intentionally or not. One of the reasons is trying to fit in. We definitely can see this being demonstrated by the locations v.s. topics visualizations. We can see that president Obama is trying to fit in African American neighborhood by using different sets of vocabularies and selecting those topics can best bring concurrence from local audiences. Code switching can help president Obama and his team to better convey their thoughts to diverse audiences and attract voters from different backgrounds. Our project demonstrates this idea by multiple cases and examples and we hope our audience can also realize the fact that code switching technique is broadly used by president Obama during public speeches.


Assignment 5

Since I was using default data set from Palladio, which is not an ideal data set for testing Gephi, I chose a different data set this time. The data set contains information of 50 movies selected from top 250 movies on IMDb. The raw data is in CSV format and looks like this:

Unnamed QQ Screenshot20151101142336


it contains various information including year of production, director, genre and etc. Since there is no strong geological connection between each movie, I do not think Palladio can be put into a good use so I chose Google Fusion Table.  Under “Chart” option, I am able to get network graph, but it does not work well on my dataset:

Unnamed QQ Screenshot20151103172337

Since Google Fusion Table only support making connections between two columns, the graph does not have too much value.

Other than network graph, the card view can be useful when audiences want to gain knowledge on specific movies. Information of each movie is collected and displayed individually,  which might be good for quick referencing. However, I  can hardly find  any interconnections within these movies due to the nature of this type of visualization.

Gephi, on the other side, did a good job on displaying the network relation of different movies. These movies have all different kinds of relation. Sometimes one movie was inspired by some movies that already got a reputation and sometimes certain groups of director always filmed good movies. All the relationships like these can be best represented by the network. For this time, I chose 50 movies out of 250 in total and constructed the network by the relationship of movies and their genres.  The visualization I had looks like this:

Unnamed QQ Screenshot20151101144917

I first load all movies and genres into Gephi and then connected them according to movies’ genres. The final result is an undirected graph and edges are not weighted. It looks great and provides a lot of useful information on the relation of most succeed movies and their genre. For example, we can see that there are the great amount of movies are under genre “Drama” and most succeed movies always have multiple genres covered. Another interesting fact about this graph is that we can see the similarities between movies. For example, the movie “Star Wars: Episode V – The Empire Strikes Back” is connected with other movies such as E.T and 2001: A Space Odyssey, which indicates they tend to have similar genres and thus potentially favored by the same group of viewers.

Personally I do not think Google Fusion Table and Gephi shares too many commonalities. Since this data set has no geographical information within it, the Fusion Table only generates list view and card view, which does not help much on constructing the network. For Gephi, it is much better on creating a network and thus a good fit to my data set. What is important when using Gaphi is that the author him/herself need to have a clear understanding on what kind of connection the visualization is trying to represent. For this visualization, I simply put both movie titles and genres as nodes but ideally genres should belong to movies as a feature. By doing this, we should be able to fully unleash the power of Gephi by getting a statistical analysis of data set such as centrality and degree of partition.

Due to the nature of the network, we should not show all relationships at one time. The complicated graph will make the network unreadable and thus become less useful. We should only pick certain relationship that we curious about and only connect nodes when needed.  The graph I got is not heavily centralized because every film only has about three or four genres at most and limited number of features indicate less degree. Gephi can help us calculate the molecularity of  a graph, which can further prove my point:

Unnamed QQ Screenshot20151101160946

It shows that the graph is slightly centralized and spread out evenly in general.


When I was preparing the data set, I just tried to found as much as information I can get but later on, I found the result means nothing. I did not have a clear mind on what kind of relationship I am interested in at the very beginning and such mindless searching made the graph useless. Like Lima said, “if you don’t really know why you are collecting it, you are hoarding it. “(Lima, p. 82) This is probably the biggest mistake I made when preparing data. It is the quality, rather than quantity of data that really value a graph. The author should not be greedy and try to include everything in one graph.






I chose sample dataset from Palladio.

In Palladio, there are five different views. The first on is map:

Screen Shot 2015-10-07 at 3.40.59 PM

Data can be presented on a global map according to the geographical information provided. Each node represents one piece of data and it can be connected by lines if there is logic connection potentially. In this view, it helps users to visualize potential interconnection of different information provided in  the dataset. In this example, it displays the places where certain person was born and dead.

The second one is Graph:

Screen Shot 2015-10-07 at 3.55.36 PM


This is pretty much like the one in Jigsaw. Users are allowed to pick a source and a target, then Palladio will find connection in-between and connect them together by lines. The scale of dots can be relative, depending on the quantity or magnitude.   In this case, I chose Arrival points as source and birthplace as the target.  It is pretty straightforward to realize the fact that all people in this data set arrived Monaco.


The third one is Table

Screen Shot 2015-10-07 at 4.11.53 PM

This one is pretty self-explanatory. Users can first choose row dimension and then fill with related information accordingly. It is a convenience tool that can help users to organize the data as the way they want. Also just like other spreadsheet tools, users can also sort the table by certain column and in this out case, we can sort all data alphabetically or chronologically.

The last one is Graph:

Screen Shot 2015-10-07 at 4.25.03 PM

This is just fancy. It is similar as  the table but organize data in another way. It helps users to collect facts about different entities in a flash card-like view. It helps users sort and organize data in different ways for quick reference. In this example, it simply displays each people’s basic information and portraits of themselves.

For google fusion table, it has  similar features than Palladio. The first one is rows.

Screen Shot 2015-10-07 at 4.37.12 PM

It looks like it is premade already and less manipulative than the one in Palladio. It dumps all information into one spreadsheet, which might be helpful to quick look up.

The second one is card view:

Screen Shot 2015-10-07 at 4.39.29 PM

It is almost the same as the one Palladio but actually more powerful. For the one in Palladio, users can only select limited amount of information to appear on each card. For google fusion table, again, it just brutally dumps everything into it, in a good way.


The last one is map view:

Screen Shot 2015-10-07 at 4.43.37 PM

It combines both the visualization of data based on geographical information and the detail of each entry at the same time. This makes Google fusion table more powerful and certainly can save user’s time on  switching back from different visualizations.

After using Palladio and Google fusiontable I can start understanding  Drucker’s idea of visualization interpretation. Especially for Palladio, it gives users a high degree of freedom to combine and connect different pieces together, without caring if such connection makes sense or not. As she mentioned in her paper that “Subjective meteorology is an elaborate and idiosyncratic system. “, how to use such tools properly could be an interesting topic because sometimes we do need to run some random test to rule out those do not  make sense and keep those worth for further investigation. Although I do not  agree with her opinion, it is still a fair warning about what might confront to when we work on our own project.