In a recent talk she gave for the Medieval Forum and the Anglo Saxon Studies Colloquium, Dorothy Kim discussed the importance of aesthetics in designing and implementing digital architectures that are not only “user-friendly,” but also that are inviting to the potential consumers of the information that the Archive of Early Middle English was trying to make available.
Kim’s talk got me to thinking about something inherent in the visual presentation of data that doesn’t get a lot of discussion. We (i.e. the people doing data visualizations and writing about them) are all so consumed with presenting information, that often discussion of the way information is presented and the choices involved gets left out of conversations about big data.
There are, in some places, some of the conversations going on about “infosthetics” and even about “data art” help to point out the role that aesthetics plays not only in disseminating complex information, but in disseminating it in such a way that it is accessible and even enjoyable to a non-specialist audience. I’ll let you go down the rabbit hole of information aesthetics all by yourself (if you’re an infographic geek like myself, kiss the next three hours goodbye), but I do want to take a moment to highlight the importance of aesthetic decisions in data presentation.
The more visual cues you can give to someone reading your data outcomes, the more likely it is that she is going to walk away with something useful, BUT it is important that we acknowledge that each of these decisions we make on the presentation end of the data materially changes the possible outcomes and interpretations others can make from our information.
That is, the more choices we make about the data in our presentation of it, the more interpretation is already being done to the data before it even becomes “data” that can be consumed.
Let’s take for instance the network graphs of Piers Plowman (or Chaucer’s corpus) that we’ve already looked at. Most of those graphs have a color schemes to help sort the information. Our original graph has a single color for all its nodes to indicate participation in the corpus. If we then want to make a slightly more complex graph from nearly the same information, we have to find a way to make the added information more accessible.
In the case of Piers Plowman, then, we might want to divide the Piers node based on the three traditional distinctions scholars already make between varieties of the Piers text: A, B & C, which have been functioning as textual categories since Skeat denominated them in the late nineteenth century.
If we just separate out A, B, and C texts from the main Piers node in the master graph (that is, we pull that big node apart into three proportionally sized nodes to represent A, B, and C manuscripts) we get a graph like this. What you see above is a Reingold-Tillford layout of this new network that I like for its very clean aesthetic.
This graph, while obscuring some information, makes certain points very clear. No, you cannot read the names of any of the other works to which the three major nodes are connected (i.e. the names of works the A, B, and C texts share manuscripts with), but you can very clearly see the relationship between each of the three nodes.
- We can see that C presents by far the biggest node, but that
- the smallest node (A) is connected to by far the most other works.
- We can also see that there are connections between each of the three major nodes, indicating that A and C texts occur together far more often than any other two forms of the text, but
- all possible types of hybrids do occur.
- Moreover, A and C share the greatest number of non-Piers works,
- and these nodes grow proportionally more often,
- while B‘s large “fan” of additional works seem to occur largely within this B-exculsive context.
If we want to know more about the specifics of these works, however, we must pull apart this line of nodes in order to better see what ends up where.As you can see, I’ve moved the major “fans” from the above graph apart here so we can more clearly distinguish what the individual nodes are. Doing this, however, obscures some of the relationships that were otherwise very clear in the above, simpler graph.
If I want this information to be just as accessible, then, I need to start organizing it better visually. In order to signal what nodes end up in MSS with a particular version of Piers, then, I turn to color-coding. This allows me to still see large patterns in the network graph, but not to have to trace very single network connection to make sense of what shares a manuscript with what else. To make this information as simple and accessible as possible, therefore, I assigned each of the three base texts a primary color (red, yellow or blue). This allowed me not only to signal what was in an A-only manuscript or B-only one, it also allowed me to rely on people’s elementary knowledge of colors to draw conclusions about nodes that are in secondary colors. That is, if a node is green, for instance, green is a mix of blue and yellow, thus a graph reader would be able to deduce that green nodes belong to works that occur alongside BOTH A and C texts in the manuscripts.
This logic did lead to the rather unfortunate choice of brown for those nodes that are in manuscripts with all three of the basic textual variants, though. Nevertheless, from this graph, you can very clearly see what kinds of works tended to circulate with Piers regardless of which textual variety was in a manuscript, and which ones tended to circulate with Piers the most often, since those circulating with more variants also seemed to just be more frequently occurring in the overall corpus.
Once I’d made this color choice for my different nodes, however, this allowed me to communicate A-B-C information in whatever other graphs I wanted to add that component to.
Take this graph, for instance, the top portion of which you may recall from the blog on the Manuscript Life-Cycle of Piers. Now, I can compare my original information on the manuscript and the length of the Piers poem in it presented over time to that same temporal information now divided into A-B-C lines.
A color-code applied consistently throughout a data set allows viewers of the data set to follow easy (and often unconsciously processed) visual cues to sort out the information you are presenting.
It also easily signals to viewers that whenever they do not see the color code, they are not looking at information that has been parsed for those particular categories. Thus, when you look at the top bar chart, the color scheme clearly tells you that there is no A-B-C information here, in contrast to the bar graph below it. It also means that when you look at other network graphs with other color schemes, you are looking at a different way of slicing the data,