Curriculum Visualization with Palladio and Google Fusion Tables

For this assignment, I chose to visualize Bucknell’s database of course information for the Fall of 2015. Specifically, I wanted to see the relationships between CCC (College Core Curriculum) requirements and courses across different departments in the University.

datasetI created my dataset by generating a CSV file of course data scraped from Bucknell’s online course database. Each row consists of a course number and a CCC requirement filled by that course. This setup means that there are some duplicate entries in the table for classes that fill multiple requirements or have multiple sections. I used this structure because Palladio and Google Fusion Tables are capable of sizing the nodes by frequency of the number of sections or requirements filled. Unfortunately, this also means that there is loss of information as there is no visual representation of the courses that do not fill any requirements. Additionally, by choosing to PalladioTableECONscale the sizes of course nodes by the number of times they appear in the CSV, we lose the ability to scale nodes by the number of students enrolled in each course, which may also be of statistical significance. As this is a large dataset, I will be applying filters by department to show subsets of the data when displaying screenshots of my visualization.

In Palladio’s table view, we can see how the software combines the multiple requirements for PalladioGallerycourses into a single table row, but it doesn’t keep track of the number of sections or links between courses. In fact, this table essentially reproduces the data that is already available on Bucknell’s online database, so there is no knowledge generation or anything interesting or thought provoking about this view. Similarly, the gallery view does not provide any additional insight to the data, as it does not intuitively visualize the data I have provided.



The graph view, on the other hand, does an excellent job of relating the differePalladioCSCInt courses and their associated CCC requirements. Edges are drawn between courses and CCC requirements, and nodes are sized according to their frequency of appearance in the dataset. Because the view is interactive, you can click and drag nodes to look at them individually and see in detail how the edges are connected. Palladio also allows you to highlight one set of nodes, which I used on CCC requirements, as they tend to be difficult to find amongst all of the courses.

Although Palladio is able to produce visually pleasing graphs, I found that Google FusiPalladioECONon Tables was able to produce even more beautiful graphs with some of the fine-tuning that this robust tool offers. I was able to color nodes by their category, which is especially useful for the larger dataset which has 23 nodes representing the CCC requirements that need to be easily differentiated from the hundreds of course nodes. Additionally, Fusion Tables highlights the edges of the node that is currently being moused-over, allowing the viewer to easily see what nodes are connected that node by edges. Lastly, Google Fusion Tables is better at scaling the size of nodes to make it easier to see which nodes are larger than others.

For this assignment, I have included network graphs for the Economics and Computer Science departments at Bucknell.FusionCSCI From the complexity of the Economics graph, with its many nodes and edges, it is immediately evident that this department is much more diversely distributed between curricular requirements as compared to Computer Science. Different people will can look at these graphs and come up with different conclusions about the data. For example, from my perspective as a student, it makes sense that introductory courses have more links to CCC requirements because these courses are intended for the general student population, particularly those looking to fill their requirements with courses from other disciplines. As a result, this graph may be more useful to individuals more familiar with the intricacies of the curricular structure at Bucknell. They could use the full dataset of courses and CCC requirements to develop models and theories on top of this visualization to identify issues with the curriculum. As a result, it is important to consider the intended audience of what is being created, since data that might be intuitively understood to one person might not be as straightfFusionECONorward to another.

I believe that the above graphs meet most, but not all, of Lima’s functions for network visualizations. This system of relations has never been documented before, since we mostly see this type of course information in table form. The system clarifies our perspective of the information, since the graph representation allows for a better means for humans to understand the data by drawing edges between related nodes, sizing nodes by frequency, and color coding nodes by type. These visualizations allow individuals to look at the curriculum, either by specific subgroups or in its entirety, to find patterns in its structure such as potential gaps or overlaps in the University curriculum. The graphs, however, aren’t very good at showing multidimensional aspects to the data, due to the simplicity of the input data, since the input CSV file only had two columns. This issue partially stems from the limitations of Palladio, as the software has trouble supporting large datasets.

Assignment 3: Networks and the Inter-Connectivity of Child Vocabulary

After my initial findings from assignment 2, I was curious to see what other conclusions could be drawn from my corpus of children’s speech. The most interesting finding from that assignment was the ability to plot the relative relation of individual word usage by both frequency and spatial relation. It became apparent from these findings, somewhat intuitively, that mapping the spoken utterances to the actual individuals who spoken them would be the next step in visualizing the data set. Instead of merely mapping the words with each other, determining the relation between words spoken and characteristics of the speaker is something which has the potential to lead to some interesting, and hopefully enlightening, conclusions.

Determining which aspects of the speakers to map to word-usage (and how exactly to do this) was initially a challenge, especially in converting the data into a csv data format. I contemplated whether or not individual word frequencies would be a useful metric for analysis, or if dividing up my given word data into sub-categories for various aspects of speech would prove more fruitful. As far as speaker characteristics, I decided that two of the most general (but also most insightful) factors would be individual age and individual age. After parsing back through my original data set in order to map this gender and age data, I realized that individual word categories might not be as informational as using a mapping of all word-utterances in relation to speaker characteristics instead. While breaking up the words into parts of speech or by noun types might have been interesting, seeing the connection between overall word-usage appeared to be indicative of a stronger visualization as a whole.

Age Vocab Visualization

This first visualization maps vocabulary usage to age of individual speaker. The highlighted nodes represent different ages while the remaining nodes represent the actual words uttered by individuals of the ages which connect them. This visualization is very interesting in mapping the intersecting nature of vocabulary and word-usage among different age groups. We see a large concentration of words branching off of the tow lower-most age nodes (representing the ages of  1 and 2), but also a large number of intersection between the two. As well, as the age goes up, the interconnectedness of vocabulary only grows, with higher age groups clustered together higher above the lower age groups. If this wasn’t so hard for Palladio to render on its own, I’d be very interested in increasing the data size with an increased vocabulary and number of age groups to see just how extensive this age-related connectivity really is.

Gender Vocab Visualization

My second visualization maps vocabulary usage to the recorded genders of the individual speakers. I find this visualization to be particularly interesting in how clearly it is able to convey the obvious differentiation between vocabularies of the various genders. While one might intuitively assume that essentially all, if not at least a majority, of vocabulary should be spread evenly between speakers of each gender, we can see that this doesn’t appear to be the case. The three recorded gender subsections (male, female, unknown) map together to have a good deal of intersection between them, but an even greater amount of bisection in unique vocabulary usage. From the network, we can analyze the varying ways in which individuals of different genders form vocabularies and where they overlap.

Both of these visualizations, though capable of spawning analysis and conclusions, are more representations than they are knowledge generators. This is largely due to the fact that despite the various lines denoting connections between nodes, the actual spatial relation between nodes doesn’t carry in meaning in itself. It is the connections themselves which have the meaning. Because of this, we are able to look upon these visualizations and see a particular mapping of information, but aren’t able to use the mappings themselves to discover some vastly different amount of information. The current arrangement of nodes and connections was done automatically by the Palladio system in order to better display the central nodes and more clearly represent the connections between each branching path. Nodes on opposite ends of the mapping are no more unrelated than the node unconnected in its immediate vicinity. To view networks we must not think in terms of place, but in terms of connection.