One year, mucking around in the manuscript corpus of Piers Plowman. One manuscript. One visualization. Every week. For FIFTY-TWO weeks.
This blog marks the commencement of a year-long intensive digital exhibition of the manuscript corpus of Piers Plowman. Now the fifty-two physical manuscripts themselves are distributed across twenty-one different libraries or repositories in four different countries on two continents. Due to these physical constraints, I’m clearly not putting them all in one room (though I do think that would be a fabulous and exciting endeavor) and then blogging about it. Nor am I simply going to be creating an “online exhibition” of each manuscript, though that kind of collecting is a part of this endeavor. Neither the material nor the digital “presence” of the Piers Plowman manuscripts is really on offer here. This blog is by no means an attempt to translate either the physical body of Piers, or the text each one instantiates, into digital form.
Most importantly, I’m not offering a surrogate (or surrogates) for the manuscripts themselves–something that would allow you to do some kind of substantive research upon a manuscript object, just at a distance. Surrogacy and text encoding are both projects that are already well underway at the Piers Plowman Electronic Archive and even from some of the various repositories that have digitized their Piers manuscript(s).
What I’m offering instead are slices of data and a data infrastructure for making both the text and the materiality of Piers Plowman accessible online and as part of the growing semantic web of linked data.
This project, then, is twofold. For the next year, I will be making a minimum of two posts per week.
1. Creating Digital Objects to Represent Physical Objects
One post each week will be a very sexy JSON description of a particular manuscript containing Piers Plowman–that is, I will use Java Script Object Notation to encode real-world, physical data about the individual manuscripts (and the text they contain) in a format that is not only online and searchable, but machine-query-able for automated data extraction.
The accumulation of JSON objects and their progressive encoding will result, in fifty-two weeks, in the data for the entire corpus of fifty-two manuscripts being made available, searchable, and mine-able for large-scale linked data operations to come.
In the mean time, that is, the digital middle ages between the advent of interconnected technology and the realization of its potential in massive amounts of linked data online, JSON objects are as easily human-readable as they are machine-readable. Collecting JSON descriptions of each manuscript, then, will be a kind of 21st-century cataloguing effort to compile information that is does not exist in a single location anywhere else.
That’s right. You heard me (so to speak). There is not a single repository of all the manuscript information for codices containing Piers Plowman in any one book, site, or source of any kind. There are manuscript descriptions in all the major sets of scholarly editions (from Skeat, Kane & co., and A.V.C. Schmidt), but all of these descriptions tend to contain textual details only. And even then, those details tend only to be those that most help the editors on their mission (to figure out what is the “authoritative text” of Piers). Any other information one might want to know about the codices as objects, or other texts within them is generally not included.
Over the course of the year’s encoding, manuscripts will be JSON-objectified in roughly chronological order, drawing attention to the temporality and complexity of manuscript copying over the course of the poem’s production. By annotating a different manuscript each week, I hope to highlight the rich history of Piers copying that goes largely unnoticed because of the way the apparatus of scholarly editing has to work.
Each JSON post will also introduce the concepts and skills behind JSON-writing and linked-data architecture. I am not an expert in this aspect, but I am learning as I go along, and I aim to make the process as transparent as possible so that anyone interested in what this kind of data publication can do for their project can also make use of the same structures.
2. Metadata Visualizations
Trust me, it’s cooler than it sounds. Metadata is, broadly defined, data about data. Now, whether or not 52 objects is enough to call anything written about it “metadata” is another thing altogether, but I’m calling the corpus-work I’m doing “metadata” because it is on the same structural level as other forms of (old-school) metadata, like libraries and catalogues. This is a digital library and catalogue of information about the complete manuscript corpus.
Each week I will post a new (or never-before-seen) visual representation of some aspect of the corpus or the manuscripts’ varying textual content designed to make a previously inaccessible or hard to distinguish pattern recognizable. Why–you might ask–would I want to do this fifty-two different times (or possibly more!)? Frankly, because I have more data than I can possibly make use of in my own research. With well over 250-pages of manuscript notes, there is a great deal of information that I have access to after viewing the MSS that would otherwise not make it into any publication.
But the reason you should care is because no single data-organization, schema, or visualization by itself can capture all of the information available, or even all of the information about one aspect. Each infographic, chart, table, graph, map, etc. makes available a different view of an infinite multiplicity of swaths of data. Each one is merely a slice of a real-world phenomenon, in much the same way that a slice of a material object is put on a microscope slide for magnified scrutiny. We can only hope to know something very particular based upon the way we have made our slice and the instruments we have to view it. Each different slice and instrument allows us to see and analyze a distinct component of the total phenomenon, and it is only through an overlapping multiplicity of slices that the phenomenon (or an aspect of it) starts to come into focus.
Yeah, but is this meaningful? Well, I’d like to say only time will tell, but it’s not just time. It’s participation, digestion, and recapitulation that tell us whether or not a particular slice revealed something significant. Only when something is picked up and re-used or when it inspires a new reflection for someone to build upon (with proper citation, of course! I’ve got my own academic career to uphold, after all…) will we know what was most important about that angle. Indeed, it’s very medieval of us, in a sense. In the same way that Mary Carruthers details the conferring of authority upon a text through its communalization and wide dissemination, so too is it with bits of data. Those that get picked up, digested, and incorporated again and again become the touchstones of the existing conversation through a massive social and collaborative (if undirected) effort of a scholarly or intellectual community.
Bonus: The Material and the Discursive (and everything in the middle)
I also want to point out that there may well be other blogs that engage more intimately with the texts of Piers Plowman, the theory of informatics or media, and critical theory on meaning-making in material discourse. These, however, will emerge only as they come, evolving organically out of existing research or from conversations carried on in the comments or via Twitter. And I suppose that is the most important component of this project as a blog. This is an ongoing and unfolding project designed not only to make information available, but to allow more interaction about the data to take place and to allow the data itself to grow and fill in as more people get involved. It is, indeed, experimental and somewhat risky, but I hope it will prove both fruitful and provocative, bringing to life not only the Piers Plowman manuscripts, and the corpus and text the constitute, but also in making digital corpus studies an easy and compelling thing for more scholars to undertake.