Assignment 3

slav-wordle-7

I chose to visualize data from a discourse analysis of Mount Carmel Daily Item newspaper articles containing the word “Slav” from 1892-1910.  I created the dataset in order to make a word cloud (above) representing the perception of Slavs in the coal region during the turn of the 20th Century.  The analysis initially included the data categories: date of publication, location, article title, epithet (specifically the words in the article used to describe Slavs, split into the three categories: modifiers, verbs, and nouns), and people.  Refining the dataset for visualization in Palladio, I added geographic coordinates, removed people, and reorganized the epithets into epithet frequency categories: race, class, and total.

graph location.epithet

Although it was scrambled and confusing at first, I found Palladio’s graph view to be most interesting for this dataset.  The visualization above is a result of inputting total epithet frequency (highlighted) and location name (un-highlighted), and sizing the nodes based on article frequency.  Since the labels are too small to read without zooming in and losing the effect of the big picture, this view is most effective for seeing which entities occur in the most articles.

graph location.date.epithet 1

In an attempt to enhance the graph view’s most helpful feature, I input location and date, and sized the nodes based on total epithet frequency.  The resulting visualization (above) is cluttered, but more easily readable

graph year

Simplifying the visualization, I organized the “year” nodes in chronological order like a time line.  This might be the most effective visualization I created in Palladio because it is simple and clear.

graph race.year

After organizing the nodes for year (highlighted) and race epithet frequency (un-highlighted), the visualization (above) is more revealing.  The organization is inexact because the nodes were dragged by hand, but the viewer is able to see in which years the most articles containing the most race epithets were published.

map zoom

Using the geographic coordinates in the map view, I plotted the location of each article on a world map.  I wanted to use different shades of red to represent the density of entities in articles of a particular location, but Palladio would only show one color at a time even though there is an option to add multiple layers to the map.

map title 1  map title 2

Instead I used the node size option to represent the racial epithet frequency.  Zoomed out (above left) the viewer can see all the articles scattered across the world, but the upper east cost of the United States is taken over by a single blob of color because the view is not detailed enough to show the individual article representations.  Zoomed in (above right) the Coal Region is visible in more detail.  Since I am not familiar enough with Pennsylvania geography to be able to identify the town each of the nodes is located in without a label, this visualization is not very useful to me.  If this visualization could be laid over a road map of Pennsylvania, it would be interesting to see which town’s newspaper articles contained the greatest frequency of racial words.

table date.race.class

The table view, although it has the ability to group the data by a chosen row dimension (I chose “year” for the visualization above), it has almost the same function as the spreadsheet the dataset was originally organized in.

gallery

Similarly, the gallery view does not seem to provide any more insight into the data than a spreadsheet.  It is frustrating to use because only a small fraction of the data is visible at once, and the format is uninteresting because none of the articles I used to compile the dataset had accompanying images.

Visualizing Networks: Wars in Old Days (Collaborated with Huang, Jiayu)

After we’ve learned how to visualize networks using two different tools: Google Fusion Table and Palladio. After all, both are very good tools. They both provided useful ways to do the visualization, despite the fact that they perform and feel differently with each other.

Instead of visualizing plain texts, both Palladio and Google Fusion Table using csv spreadsheets as raw data input. In some sense, those raw data are more solid in a way that those spread sheets may only contain facts; unlike essays, blog posts, etc., it would and should provide a more predictable result instead of the possible surprises that text analysis may provide. And interestingly, Voyant and Jigsaw are local software while Palladio and Fusion Table are applications that based on a browser, so it seems that network visualizing software need networks :D.

Raw data collection:

Ignoring the fact that different people may feel differently based on the fact that one have to build their own csv file to work with those tools, they are very interesting tools for network visualization, especially they have maps build-in, and that’s why we choose to step out of our RPG analysis previously since they have limited location associated with them. What we finally decided to visualize is the warfare that happens between 1900 and 1950, in which the two most famous ways lies in the period: WWI and WWII. What we have in our data includes: start time, end time, name, the country involved, and the result(indicating who won the war and who lost it).

 

Palladio, a Beautiful Platform with Potential:

First thing first, different from what my partner’s value (and him being a Google fanboy), I think the beauty of a tool should be valued more than it is valued currently, especially for a humanity tool; after all, I cannot believe that a tool would generate a stunning visualization without itself being beautiful also. With that in mind, I really love Palladio; although for the performance issue we choose to make only one visualization from it, I still love it. Its modern and simple design really catch my heart when I first opened it. However, since it is really slow when processing more than 300 lines of data (which shouldn’t be a large data), it is a pain to use it. With such a digitalized world and such a large flow of information everyday, it is not that useful, and would put such a wonderful designed tool to a very embarrassing position. But since it is a fairly new tool, and hopefully still developing, it has a great potential since it has a very good start.

Put all the judgements aside, let’s look at the actual visualization:

Screen Shot 2015-10-05 15:05:44 +0000

The pervious figure is the visualization of the number of country involved in the war with respect to the time. It is called timeline in Palladio. From the figure, we could see that from 1916 to 1921 is the time that most countries involved in a war. Interestingly, WWI happens at a similar time. However, for the WWII time, which is from 1939 to 1945 approximately, much less countries were involved, even less than the number of countries involved in a war before WWI. Still, people remembers WWII more than other wars; it is not only because that the newest war that involves lost of countries in the worlds, but also because it has more damage to the humanity. With the massive use of tanks, machine guns, aircrafts and heavy artilleries, it is easier to kill people than ever before, not even mentioning the existence of the nuclear bomb at the end of the war. Such less country involved in the war could suggest that warfare at the WWII time moves rapidly with the existence of so many new weapons, and warfronts changes quickly.

Google Fusion Table, A Productive Tool Made by Google:

There are less thing to say about the Fusion Table. It does not means that it is not useful; to be honest, it is more useful than Palladio. But that’s only because it is more mature, including its look: it make me feel like using a super professional software back in 2008. It doesn’t looks bad, but just not very attractive after seeing the beautifulness of Palladio, and, as the fanboy says, it is super productive.

The very first visualization is the duration vs the time when the war starts. The upper part(with the white background), is the raw plot of the s time; while the lower part of the figure(with the blue background) is the value difference from the lowest point on the plot versus time. As we can see, there are sometimes that the war happens rapidly, and there are sometimes that only little war happens. Also, as mentioned, wars in old time tends to last longer than the newer wars, with all the new killing machine introduced to the world, especially when comparing WWI and WWII together: although WWII made humanity suffered more, individual wars last shorter than wars in WWI era.

Screen Shot 2015-10-05 15:23:34 +0000

The second visualization visualizes the country that was defeated(Figure b), and those won it. From that, we can see that most countries involved is in Europe, suggesting how bellicose that Europeans are at that time. Also, there are lots of points in South America, which could possibly be the revolution or colonialization wars, and, more interestingly, most countries lost that war(2:7, victory: defeat)

Figure a: countries that won the war

Screen Shot 2015-10-05 15:28:10 +0000

Figure b. Countries that lost the war

Screen Shot 2015-10-05 15:27:52 +0000

The last two visualizations are the relationship of the winning countries(labeled in blue) and the defeated countries(labeled in yellow). The more involved in the war, the bigger the dot is. What we can see is that China and US have large dots on the screen. For China, it is not surprised since it has lots of domestic or international conflicts at that time; but for the US, I am very surprised that it has involved in so many wars, in which I previously thought impossible since it is far away from the “main world”, Europe. Though, the most interesting fact is that many large dots on the graph are those who are involved in a civil war instead of the famous two World Wars. The reason for this fact might because that civil wars tend to make an influence on a vast area of the land, and tends to last longer as it is difficult to kill all the rebellions. Since that the scale of the war is not part of our raw data(especially how many people dies), this might seems misleading in the sense that large dots might not make people suffer more. But still, it can show things up; although in the future we might go and think more about making a visualization: to make it least misleading as possible.

As we can see, America involves in the war a lot, as it is a very big dot on the screen.

Screen Shot 2015-10-05 15:26:31 +0000

Screen Shot 2015-10-05 15:27:27 +0000

Curriculum Visualization with Palladio and Google Fusion Tables

For this assignment, I chose to visualize Bucknell’s database of course information for the Fall of 2015. Specifically, I wanted to see the relationships between CCC (College Core Curriculum) requirements and courses across different departments in the University.

datasetI created my dataset by generating a CSV file of course data scraped from Bucknell’s online course database. Each row consists of a course number and a CCC requirement filled by that course. This setup means that there are some duplicate entries in the table for classes that fill multiple requirements or have multiple sections. I used this structure because Palladio and Google Fusion Tables are capable of sizing the nodes by frequency of the number of sections or requirements filled. Unfortunately, this also means that there is loss of information as there is no visual representation of the courses that do not fill any requirements. Additionally, by choosing to PalladioTableECONscale the sizes of course nodes by the number of times they appear in the CSV, we lose the ability to scale nodes by the number of students enrolled in each course, which may also be of statistical significance. As this is a large dataset, I will be applying filters by department to show subsets of the data when displaying screenshots of my visualization.

In Palladio’s table view, we can see how the software combines the multiple requirements for PalladioGallerycourses into a single table row, but it doesn’t keep track of the number of sections or links between courses. In fact, this table essentially reproduces the data that is already available on Bucknell’s online database, so there is no knowledge generation or anything interesting or thought provoking about this view. Similarly, the gallery view does not provide any additional insight to the data, as it does not intuitively visualize the data I have provided.

 

 

The graph view, on the other hand, does an excellent job of relating the differePalladioCSCInt courses and their associated CCC requirements. Edges are drawn between courses and CCC requirements, and nodes are sized according to their frequency of appearance in the dataset. Because the view is interactive, you can click and drag nodes to look at them individually and see in detail how the edges are connected. Palladio also allows you to highlight one set of nodes, which I used on CCC requirements, as they tend to be difficult to find amongst all of the courses.

Although Palladio is able to produce visually pleasing graphs, I found that Google FusiPalladioECONon Tables was able to produce even more beautiful graphs with some of the fine-tuning that this robust tool offers. I was able to color nodes by their category, which is especially useful for the larger dataset which has 23 nodes representing the CCC requirements that need to be easily differentiated from the hundreds of course nodes. Additionally, Fusion Tables highlights the edges of the node that is currently being moused-over, allowing the viewer to easily see what nodes are connected that node by edges. Lastly, Google Fusion Tables is better at scaling the size of nodes to make it easier to see which nodes are larger than others.

For this assignment, I have included network graphs for the Economics and Computer Science departments at Bucknell.FusionCSCI From the complexity of the Economics graph, with its many nodes and edges, it is immediately evident that this department is much more diversely distributed between curricular requirements as compared to Computer Science. Different people will can look at these graphs and come up with different conclusions about the data. For example, from my perspective as a student, it makes sense that introductory courses have more links to CCC requirements because these courses are intended for the general student population, particularly those looking to fill their requirements with courses from other disciplines. As a result, this graph may be more useful to individuals more familiar with the intricacies of the curricular structure at Bucknell. They could use the full dataset of courses and CCC requirements to develop models and theories on top of this visualization to identify issues with the curriculum. As a result, it is important to consider the intended audience of what is being created, since data that might be intuitively understood to one person might not be as straightfFusionECONorward to another.

I believe that the above graphs meet most, but not all, of Lima’s functions for network visualizations. This system of relations has never been documented before, since we mostly see this type of course information in table form. The system clarifies our perspective of the information, since the graph representation allows for a better means for humans to understand the data by drawing edges between related nodes, sizing nodes by frequency, and color coding nodes by type. These visualizations allow individuals to look at the curriculum, either by specific subgroups or in its entirety, to find patterns in its structure such as potential gaps or overlaps in the University curriculum. The graphs, however, aren’t very good at showing multidimensional aspects to the data, due to the simplicity of the input data, since the input CSV file only had two columns. This issue partially stems from the limitations of Palladio, as the software has trouble supporting large datasets.