Bucknell Curriculum Vizualization

When high school students begin the college search, they are repeatedly bombarded with the same information about class size, department strength, learning goals, etc. from every university they encounter.  Each institution, in the interest of attracting students to apply, wants to put its best foot forward.  Understanding this motive behind the information Bucknell (as well as other colleges) makes publicly available on its website invites further scrutiny: does the information change once students commit to Bucknell?

The Bucknell mission statement, learning goals, college core curriculum goals, and department summaries are available to anyone on the university website.  All of this information essentially communicates the same thing: enabled by a Bucknell education, students grow into more mindful, critically thinking, capable, creative, and culturally aware contributing members of the global community.  Does the information only available to people with a Bucknell login, such as course descriptions and the specific classes that fill particular CCC requirements, is the carry the same content and cadence?  Is the public face of Bucknell, constructed through its publicly accessible website information, representative of a Bucknell student’s educational reality?

My personal stake in this research has to do with the difficulty I had selecting a major.  Every adviser tells incoming freshmen to take their time exploring, start by filling general education requirements before settling into a major.  I was told I had plenty of time to decide, but when the time came to declare a major I didn’t feel as though twelve credits-worth of experience was enough to go off of.  Coming from a fairly generic high school, I had no idea what it would mean to be an anthropologist, economist, creative writer, or comparative humanist because I had no experience and knew of no one who had experience in these fields.  If the publicly accessible department descriptions are not truly representative of the field, it puts more pressure on course selection in order for students to gain insight into a branch of knowledge.  But how can students be expected to choose courses they will enjoy and gain meaningful experiences from if the selection process is a gamble?

I began with a specific interest in the materials studied in the three comparative humanities core courses.  Visualizing genre and author/artist gender and ethnicity drew attention to the gaps in the courses’ coverage; specifically a lack of women and non-western authors.  (Visualizations below created in Palladio: on the left a graph view dividing the course materials based on gender, on the right a map view plotting the materials’ location of publication.)

palladio graph author sex     palladio map sized

From there, I became interested in broadening the scope of the visualization to the university as a whole.  Since I do not have access to all the syllabi in every department, I had to shift the focus of the visualization to a different, but related, set of data: course descriptions and requirements as seen in the online course catalog.  This data is especially intriguing because, although it is easily accessible for all Bucknell students making choices about which classes to take, its presentation (a glorified spreadsheet) is indigestible and makes comparison difficult.  My goal was to find a way to view all, or as much as possible, of the data at once in order to access a macro-perspective.  Initially I planned to use Stefanie Posavec’s “Writing Without Words” (below left) as a guide for the tree-like structure I wanted to create.  As “Writing Without Words” reveals Kerouac’s structural style in On The Road, I thought a similar design could reveal the structure of Bucknell’s course offerings.  After some experimentation, I realized my data appeared confusing and sloppy in such a format.  Instead, I borrowed Borris Muller’s circular structure of “Poetry on the Road” (below right) to give shape to my data.

writing without words                   poetry viz

The “Poetry on the Road” model enabled me to more closely follow Tufte’s principles of display architecture, which include: “(1) documenting the sources and characteristics of the data,” which the visualization accomplishes through its shape, designed to reflect the relationships between departments via CCC requirements; “(2) insistently enforcing appropriate comparisons,” made possible through the various options for node sizing; “(3) demonstrating mechanisms of cause and effect,” by the simple organization of data into the democratic, circular structure in which the viewer’s eye is not drawn to a particular area for any reason other than the concentration of edges; “(4) expressing those mechanisms quantitatively,” as I did by sizing and connection each node based on quantitative data from the course catalog; “(5) recognizing the inherently multivariate nature of analytic problems,” shown through the combination of variables such as node color, size, and location, and different CCC requirements; “and (6) inspecting and evaluating alternative explanations,” as we explore in Nadeem’s interactive network visualization for each department (Tufte 53).


Inspired by “Poetry on the Road,” I organized all of Bucknell’s academic departments into rings based on the size of each College/School (above).  The outer two rings, with nodes colored purple, represents the College of Arts and Sciences.  Since the College is so big, I split it further into an Arts and Humanities ring and a Science (hard and social) ring in order to make the visualization easier on the eyes.  The center ring, with red nodes, represents the College of Engineering.  The inner ring, with blue/green nodes, represents the School of Management.  In this particular visualization I chose to size nodes based on the number of unique courses offered in each department for the Fall 2015 semester.  For example, the music department has the highest number of unique courses (73) so it is represented by the largest node, and astronomy is one of the departments tied for the lowest number of unique courses (1) so it is represented by the smallest node.  I initially intended to make node size a variable for comparison by creating alternative visualizations with nodes sized based on number of total courses offered or the number of possible ways to fill CCC requirements in a particular department, but altering node size did not fit seamlessly into the narrative of the project as a whole.

circle.unique.allpub  circle.unique.CCQR.DUSCpub

Since my intention was to create a means to view as much of the course catalog information at once as possible, I first tried to map the edges for all the CCC requirements at once (above left).  Although it made for a decent website header image, the colorful quagmire is too cluttered to be analytically useful.  Even including as few as two CCC requirements on the same image does more harm in the clarity department than it does good for comparison purposes (above right, Quantitative Reasoning and Diversity in the US requirements pictured).

ARHC with nodes  ARHC

Although visualizing one CCC requirement at a time on top of the department nodes is simple enough to convey the data clearly, I decided to simplify even further by removing the nodes (Arts and Humanities requirement pictured above).  It became necessary to include a template of the nodes without any CCC requirements under the narrative tab in order for the visualization to make sense; but the visualization is still ledgible because the division of the different rings is intuitive enough to grasp without looking directly at the location of the nodes.  And the image is more visually impactful with just the edges.

macro–>  relationship –> micro

When it came time to combine the static and interactive aspects into a single visualization with a reasonably linear narrative, we decided to use the macro>relationship>micro view structure.  Starting with a macro view, a visualization will “facilitate the understanding of the network’s topology, the structure of the group as a whole, but not necessarily of its constituent parts” through a holistic view of the visualization, enabling users to see its overall pattern” (Lima 91).  Our macro view is located in the narrative (above left).  It offers both an overview of Bucknell’s academic structure through the listing of learning goals and college core curriculum design taken directly from Bucknell’s website, and a color-coded comparison of Bucknell’s learning goals to its CCC design.  This choice contextualizes the visualization for viewers who may not be familiar with Bucknell’s academic mission.  From the narrative tab, the viewer is prompted to select the college core curriculum tab to access the relationship view (above center), which “is concerned with an effective analysis of the types of relationships among the mapped entities” (Lima 92).  The edges of our static relationship view offer a perspective on the relationships between different departments through CCC requirements.  Finally, the user can click on a node to explore a singular department in more depth in the micro view (above right).  Although the micro view offers the most narrow perspective, it offers comprehensive, explicit, and “detailed information, facts, and characteristics on a single-node entity,” which helps to “clarify the reasons behind the overall connectivity pattern” (Lima 92).



Curriculum visualization (Nadeem Nasimi’s) http://nadeem.io/270/

Final project reflection


Our team’s research question is to investigate if Obama actually is using code-switching technique in his speeches when talking to audiences belong to different classes, race and ethnicity groups. I can’t help but feel obligatory to share this Youtube video with my fellow readers. Although it is an exaggerated version of how code-switching technique is used, it can still be an excellent example demonstrating how it can be adopted in real life.


President Obama drew public and media’s attention at the very first day he became the president of the United States since 2008. He becomes an embodiment of black culture as he being the first African-American president of the United States. The definition of code switching originally indicates  frequent and instant switching between two or more distinct languages (Wikipedia). However, in our project, we tend bring a more generic and broad definition of code switching.Now it also indicates subtle and reflexive changes of the way people express themselves encountering different situations. The project first performs general linguistic analysis and then attempt to find traces and evidence of cases which code-switching was used in his speeches.


Our project assumes audiences have no sociology and linguistic backgrounds. All terminologies and abstract ideas that are needed will be explained in a way that is understandable by everyone. All visualization will be digital and we post our work on a website, which is accessible for everyone in anywhere from the world. The whole website is designed in a  storytelling fashion that audiences will follow the exactly steps we took to reach the conclusion that we had. We believe this is a more persuasive way to let people really understand ideologies behind our work and also a more interesting way to express our idea at the same time.


Most visualizations are combinations of both interactive and static view. Most visualizations in Voyant, Gephi and Google Fusion Table have interactive features and allow audiences to explore by themselves. We chose to first post static snapshot of visualizations from Gephi and Voyant to let audiences have a general understanding of visualizations. Audiences can further play with them by clicking links behind snapshots.

All data we used, which are mostly speeches of the president Obama, come from this website. We first process all speeches to get metadata. Our metadata consists of locations, audiences and topics and time of all speeches. We think this could help us to analyze speeches from different dimensions, which enable us to perform more comprehensive analysis from different angles.


The first analysis we performed is word frequency analysis. This is done by Voyant. We first group data into different groups, classified by time, audiences and topics in specific. I took off some words from word clouds in order to give more representative results. Words such as ‘i’, ‘they’ and ‘god’ exist almost in all of his speeches and they do not have special meanings under different scenarios. An example of visualizations from Voyant looks like this:


Voyant Visualization of 2012

This is the word cloud for all speeches in 2012. We can see that one of the most distinguished words from it is “romney”. It makes sense since it was during midterm election and Romney was the strongest opponent at that time. At the right side of the visualization, we can also find the word “tax”. This also can be representative since Obama was proposing multiple reformation on taxation, such as increment of tax on high-income taxpayers and lower tax for startup companies and small businesses.


This is another visualization from Voyant. This word cloud contains all words under category ‘Military’, which are speeches that president Obama gave to military personnel. It is pretty self-explanatory that the most distinguish words are ‘iraq’ and ‘security’.In general, Voyant standalone cannot give us any useful conclusions. This is due to the nature of corpus. Word clouds only display words by frequency. There is no necessary correlation between the importance of a certain word and how many times it appears in corpus. Words like’ I’ mentioned above are not helping us to grasp the essence of speeches. Also, most words in word clouds are nouns. It is hard to find his attitudes from nouns. Verbs and Adjectives are more useful in this case and Voyant is not good at selecting words by their function. However, it is still helpful in some degree. Both of these visualizations prove that fact that he did use different sets of vocabularies in different situations. This further suggests that he is likely to use a different set of vocabularies to handle different scenarios.


Voyant Visualization for millitary personal


The next series of visualizations analyze the relationship between topic and location. Although once again, it is not providing direct prove of code switching, it shows us the fact that locations sometimes are specifically selected by president Obama and his team for certain topics. This is one of the visualizations:


Keywod classified by states

This visualization displays keywords of his speeches grouped by states. This visualization gives us some interesting result. For example, in states like Mississippi, Alabama.Georgia and South Carolina, where has relatively higher percentages of African American than those in other states. We can see that keywords are words such as ‘Hope’,’Change’ and ‘Affect’, which are all positive and all share one similar idea. Considering these locations, I do not think this is just a coincidence. I think president Obama and his team realize there are distinguished percentages of African American residents. He knows these words are exact the words that can excite African Americans and make them support him. From this example, we can see that code switching technique both depend on location and topics. Different locations have other concentrations of population. Such concentrations can be dominated by race, ethnic groups, class and etc. Different topics, at most of the times, are targeting specific group of the population. Combining both location and topics, different styles of speech are expected in order to satisfy specific groups of people.


The last visualization is done by Gephi:


This visualization consists of all words from speeches during five years(08-12). In this visualization, we can see that it look like annual rings of trees. In the center, where has most nodes condensed it, it means these words are used most frequently crossing five years. The concentration in the center suggests that there is a core set of vocabulary that used by president Obama in most speeches. In the outer area, we can see there are rings with different colors overlapped with each other. These are words appears mostly in a certain year but are not distributed evenly across five years. It is known from previous visualization from Voyant that president Obama focused on different topics each year. These words are most likely addressing these issues in particular. This is direct evidence of code switching. Those unique words that are only used in specific location, time and facing specific audiences can be best exemplified how code-switching is adopted by president Obama. We are definitely going to further investigate and test different visualizations in Gephi if we get a chance to do so.

In general, I think now it is fair to say president Obama is adopting code switching. There are several reasons when people choose to code switch, whether intentionally or not. One of the reasons is trying to fit in. We definitely can see this being demonstrated by the locations v.s. topics visualizations. We can see that president Obama is trying to fit in African American neighborhood by using different sets of vocabularies and selecting those topics can best bring concurrence from local audiences. Code switching can help president Obama and his team to better convey their thoughts to diverse audiences and attract voters from different backgrounds. Our project demonstrates this idea by multiple cases and examples and we hope our audience can also realize the fact that code switching technique is broadly used by president Obama during public speeches.


Final Project- Yifu

Our final project is called “Poker face” and it is a analytical research on Obama’s speeches between 2008-2012. Jiaming and Jinbo were both very helpful and thoughtful during the whole process of our final project.

  • The Start

Our first question that attracted us to this field of study was “How does a president speaks when in different situation?” We thought it would be very interesting and meaningful to look at the speeches given based on divergent outside situations. Later at the advice from Professor Faull, we decided to also put effort on the topics that has been talked about in the world by President Obama.

  • The Corpus

The corpus consists of almost 200 speeches from the five years from 2008 to 2012. The first thing that we wanted to know about these single speeches is the categories that each of them belongs to. According to the general question that we come up at the beginning, each speech is assigned a time, a place, a topic and an audience. Thus we could sort the 200 speeches into a .csv file and do research and analysis that we wanted to do. This is the most time-consuming process: it almost took our three two weeks to read through every speech and categorize each speech into the right place. There are some very clear figures that we could find as we eventually finished the file. Almost half of the speech took place in D.C. where the white house located. And the most frequent word used in speeches to students and commencement is “education” and the second most frequent ones are “job” and “economy”. These are quite interesting observations before we dig more into the data. What is the purpose that Obama give speeches to high school student and graduates? To use their educational experience to find good job and make the U.S. economy better, isn’t it?

After we finished the excel sheet, the next step to work on the set of data is to categorize the corpus one by one. We created several folders and sort these 200 speeches by time, places and audiences for further research. It is even more amazing that we did another spreadsheet for Gephi and it contains every single word that Obama has used in these five years(it is nearly 70,000 lines).

  • The First Step

We followed the sequence of the our course study in the usage of the software and platforms and we tried to see if everything would give us something useful or meaningful. So the first step is the powerful word analysis tool-Voyant. The best of Voyant is that the word cloud part is really clear and easily changeable. As our interests, we put the sorted text into Voyant. As the following picture shows, it is very clear that each one is different. In 2012 the word “Romney” made a huge part which refers to the election of the year. And when the texts were sorted by audiences, they show that Obama literally talks different things to different people. One could be able to guess which one refers to which audience group even if I erase the labels.

wxid_u97vx2nsf90a41_1450231018735_83 wxid_u97vx2nsf90a41_1450230979655_29

One more interesting finding is the total word usage.


According to the data above, Obama used more than double amount of the words when talking to Politicians than Military. It is reasonable to conclude that the writer behind Obama’s speeches must have thought about how to make the speech more useful. In fact, we all have the experience that it is necessary to say one thing in different ways when we are talking to different people. So we could see the speech strategy and construction really cares about whom the speech is given.

  • The Map

Another very important part of our project is the map. We got the inspiration of denoting one place with a certain word in the DuBois exhibition early this semester. We all think that this will be an attractive thing to the readers about our project. As the following pics show, each place or country that Obama has visited and gave speeches was assigned a word.


The word/world map is a real representative of the speeches geographically. The most useful way to interpret this map is the how the place/country is connected with American politics and economy. It is reasonable to see that words like “Nuclear” appears on Moscow and “Security” appears on Afghanistan. There are also “partnership” on Australia and “Democracy” on Brazil which are not listed on the map above.

During the process of making this map, I first ranked all the places and countries in a decreasing order according to the number of speeches took place so that I could decide that big/important words goes to the matching place. Then I put the texts sorted by places into Voyant and look for the most frequent word. Later when I do the same thing along the rank list, I avoided using repeated words so that the map looks more meaningful. I also disregarded the most common words like “American” and “People”.

  • Relationships


Other than words analysis, out interests also contains the possible relationship between his speeches. The visualization above is the relationship between places and topics in Google fusion table. The biggest D.C. is in the middle for sure and almost all of the topics (yellow nodes) are around. Also we could see that the Religious topic was only talked about twice in these five years. This visualization is interactive on our website. One can drag those points to see the relationship.

  • World of Word


This might be the most complex and beautiful visualization that we have done so far. In my own mind, Gephi is the strongest tool in data visualization but I have not make full use of all the functions. In this 10,000 nodes graph, it shows all the words that have been used in the five years. It looks like a tree ring and several rings can be seen. There are five rings in total if we do not consider those scattered points outside the graph, which refers to those words only appeared once. And each of the five rings represents words have been put in use in a certain year.

  • Conclusion

I like this topic very much and we put in much effort as a group in this final project. I think we could do much more things with the corpus available. The sentimental in Jigsaw and timeline in Palladio combined together would be a good try but we do not have enough time to do everything. Also the visualization in Gephi could definitely be analyzed with more details.