@context for Trinity College Dublin MS 212

Last week, we went over how to write simple JSON to describe a manuscript object.  It wasn’t a perfect description (in fact, if you noticed, I used the same “name” in two different “name”/value pairs to mean two different things! I used “folios” to refer to how many folios the MS contained and which folios Piers occupied), but it was valid code.

"folios" fail

 

What I want to talk about this week is how to write descriptions in JSON that are able to be incorporated directly into a linked data framework.  Now, I’m going to talk at more length in a later post on linked data and the basic principles thereof.  To define it in brief, though, I’ll share this definition from W3C:

Linked Data is a way to create a network of standards-based machine interpretable data across different documents and Web sites. 

Today, I simply want to show you how to go from JSON to JSON-LD in a few simple steps that aren’t very much harder than what we did last time.

In order for our data, which we are encoding here, to make itself available to searches, queries, and visualizations by humans and machines other than ourselves (and our laptops), we have to define our terms by referencing a known JSON-LD context or a context that JSON-LD can parse. That is, our data must make use of existing vocabularies in order to make sense to a wider context. It’s just like communicating in any other way.  Sure, you can “speak” using whatever semantic units you want, even some that are arbitrarily made up on the spot, but if you want someone else to be able to understand you, you have to used terms that are mutually agreed upon and have some built in conventions.  Same thing in code. You can make up whatever terms you like, as we did last week, for convenience, but there’s no guarantee that everyone is going to want to use “MSshortHand” to denominate a “nickname” for a given manuscript. Thus “MSshortHand” is going to be unreadable to all machines (which don’t have intuitive interpretive functions) and possibly to some humans.

Indeed, in order for a component of your encoded data to be linked data, it absolutely, positively must participate in an established vocabulary and syntax. NO PROBLEM.  The syntax we are using, is JSON-LD.  The vocabulary, though, is another thing entirely.

At present, there isn’t a single source for a great codicological vocabulary for this endeavor.  We do have, however, a few online sources that can help us to establish a standard vocabulary and make our data readable. I’m going to detail both the sources and discuss the vocabulary they make available as I add context to my existing JSON object.

To start, we have to begin with valid JSON. Here is a short bit of valid JSON using the same terms we used to describe Z last week:

TCD212JSON

In order to link this information, we first need to add a context, which we’re going to do by simply adding an object at the top of our JSON-LD document. We are then going to tell it we are writing context like this:

{

@context: {

}

}

The outer set of {} (which I like to keep aligned to keep track of all my pairs) says we are creating an object. The object we are creating is a known type of object, which we can see from the “@”. JSON-LD has a short list of tokens and keywords using the @ symbol to signal to JSON-LD what kind of information we are giving it. So, “@context” is a known thing to JSON-LD, and it must be formatted as an object (within our larger object).  Our entire string of code is going to include two different parts of one object: one will be context, and the other will be data.

Remember, everything inside {   } is our single object.

{

“@context”: {

},

….(data entry)….

}

Now, as far as I know, most of the codicological terms we’re using here aren’t already part of a JSON-LD document, though that is something that needs further investigation.  We do know for sure, though, that there are several JSON-LD ready vocabularies already online, so wherever possible, we want to use those vocabularies to make parts of our data legible.  Some examples of existing and useable vocabularies include:

schema.org             http://schema.org
DCTerms                        http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms
RDFschema            http://www.w3.org/2000/01/rdf-schema#
XMLschema           http://www.w3.org/2001/XMLSchema#
FriendofaFriend  http://xmlns.com/foaf/spec/#sec-extrefs

Now, we are going to use some of these vocabularies to define our terms in the @context object. Simply put, context is used to map terms to stable definitions or identifiers. Let’s start with our naming problem. I’m going to use the FOAF (Friend of a Friend) vocabulary because it’s the only one that gives me the opportunity to use a complete name and a nickname to describe the same object. Other than this context FOAF isn’t going to help very much, because it’s for describing social networks of persons in particular.  So, easy peasy. Two steps:

1. Just add FOAF context at the top:

FOAF context

2.  And then, we can add the terms we want to include from the FOAF vocabulary. First, we add the term as a term in a name/value pair.

"name"

Then, we use another object containing an @id name/value pair to ‘define‘ our term “name” according to FOAF.

"name"@id

Something else to note here, is that as I was adding “@id”, I noticed I forgot to put “@context” in the necessary double quotes. Easily remedied in this screenshot.  Before I finish with FOAF, I’m going to add the nickname identifier so that I can use the shorthands I already have throughout my data without worrying that doing so will make the object less visible to machines reading it:

foafcontext

 

Notice that the “@context” object has its opening and closing curly brackets, which tells JSON-LD readers that the context is finished. Within the object, all the items of data are separated by commas and the final item doesn’t have a comma because it’s followed by the closing bracket.  Just as in standard language, machine languages need proper grammar.  Don’t forget your ” “s ,s and {}s!

Now, let’s talk vocabularies! I want to list some existing terms and how to add them to our contexts, and then point to a kind of “best practices” for using terms that are as yet not part of any JSON-LD readable document.

 

First is DC Terms, from the Dublin Core Metadata Initiative.

Useful terms from DC Terms include:

Now, notice that some of these are all lower case and some have capitals.  That’s because keywords are case sensitive, so be aware that you are consistent!

If you want to add any of these to your context it just looks like this.  Now here, you also want to note that I’m going to go ahead and use “PeriodOfTime” to define my “DateRange” with the “terminus ad quem” and “terminus ante quem” because DC Terms already gives me a start and end date (Definition: “An interval of time that is named or defined by its start and end dates”) and it allows me to keep “DateRange” in my data, but make it recognizable as a period. That does mean I’ll have to modify how I present the data itself in the data object below.

DCTermsContext

Also notice that I was able to use a new formate, the prefix:suffix , to simplify my use of these new vocabularies.  In both places where I introduce a vocabulary (“foaf”: and “dcterms”: ) I then add a base url as the value.  Thus, whenever I use “foaf” or “dcterms” below, the computer reads “http://purl.org/dc/terms/” for the prefix (in the case of the latter) and simply tacks on whatever text is in the “suffix” portion so you get “http://purl.org/dc/terms/PeriodOfTime” without having had to write all that out.  This saves time in coding, but it also makes all the data more easily human-readable, since all you see is something like a word (a semantic unit we’re used to) rather than a URL with only a tiny change.

Ok, I know you’re tired at this point, so I’m just going to throw one more working vocabulary at you before best practices and then we’re DONE.  Hang in there.

The other really useful vocabulary comes from TEI or the Text Encoding Initiative.

TEI is an xml-based encoding resource, which at first may seem problematic, but no worries.  JSON-LD is equipped to interface with xml. The only drawback is that we have to add .html to all our suffixes in order to make a prefix:suffix pair.  Thus,  we make our TEI context like this:

TEIPrefix

TEI has a number of helpful elements that can be used the same way:

  • msDesc
  • msIdentifier
  • repository
  • msContents
  • msItem
  • author
  • title
  • physDesc
  • objDesc
  • region

Some, but not all of these will appear in the completed code for TCD 212 below.  Now, anywhere that the TEI reference ,”msContents” for instance, is just as readable as the arbitrary descriptor I used, “Contents” , I just replaced my moniker with TEI’s.  I could have left it and defined “Contents” as {“@id”: “TEI:msContents.html}, but it seemed to make more sense to make my vocabulary align with the existing one instead of the other way around.

That concludes my basic manuscript data vocabularies available from existing linked data sources. I want to turn now to a source that is extremely helpful for establishing codicology LD best practices but is not as yet (so far as I know) actually equipped for linked-data endeavors.  Codicologica is an online codicological vocabulary that outlines standardized terms for codicological description.  The vocabulary is extensive, but all the terms are defined in French.

Codicologica

 

Thus, even when you look up “support” using the English Index, you are directed to a French definition of support. Support

It may already be usable in @context with an @language modifier, but to keep things simple we will leave it out of context because it is in French. 

Nevertheless, being based on Denis Muzerelle’s Vocabulaire codicologique, it is the standard vocabulary for codicology, so in the raw data portion of our JSON-LD, we want to make sure to use those terms rather than any other to describe elements of our manuscripts. What this does is allow for a more seamless transition into MS linked data because either Codicologica itself will become usable as linked data OR a manuscript-specific vocabulary will be built using the standard terms from Codicologica. As soon as that Codicologica vocabulary exists, we can add it to our context as “VoCod”: {“@id”: “http://…/”} and be on our merry way with even more context.

To reiterate why that is important, I want to point out that it is our @context that actually links our data. That means that anything we used as a term in a name/value pair that is not defined in the @context is not linked data. It is simply data attached to the parent object itself.  It will be accessible when we look at data about TCD 212, but it will not itself make reference to anything outside itself or be findable by any outside search or query.

And NOW, for the BIG FINISH, here is TCD 212 in a contextualized JSON-LD script!

TCD212JSON1

TCD212JSON2

TCD212JSON3

 

 

{

“@context”: {

“foaf”:”http://xmlns.com/foaf/0.1/”,

“name”: {“@id”:”foaf:name”},

“MSshortHand”: {“@id”:”foaf:nick”},

 

“dcterms”: “http://purl.org/dc/terms/”,

“DateRange”: {“@id”: “dcterms:PeriodOfTime”},

“provenance”: {“@id”: “dcterms:provenance”},

“language”: {“@id”: “dcterms:language”},

“PhysicalObject”: “http://purl.org/dc/dcmitype/PhysicalObject“,

“PrintedEdition(s)”: {“@id”: “dcterms:BibliographicResource”},

 

“TEI”: “http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-“,

“msDesc”: {“@id”: “TEI:msDesc.html”},

“msIdentifier”: {“@id”: “TEI:msIdentifier.html”},

“repository”: {“@id”: “TEI:repository.html”},

“msContents”: {“@id”: “TEI:msContents.html”},

“work”: {“@id”: “TEI:msItem.html”},

“title”: {“@id”: “TEI:title”},

“author”: {“@id”: “TEI:author.html”},

“DialectRegion”: {“@id”: “TEI:region.html”},

“publisher”: {“@id”: “TEI:publisher.html”},

 

“xsd”:”http://www.w3.org/2001/XMLSchema#”,

“date”: {“@id”: “xsd:date”}

}

,

“@type”: “PhysicalObject”,

“MSshortHand”: “TCD212”,

“HoldingLocation”: “Dublin, Ireland”,

“repository”: “Trinity College, Dublin”,

“msIdentifier”: “Dublin, Trinity College, MS 212”,

“Olim.”: “Dublin, Trinity College, MS D.4.1”,

“DateRange”: 1390-1400,

“Provenance”: null,

“AquisitionDate”: null,

“Material”: “Vellum”,

“SupportQuality”: null,

“Folios”: 90,

“Script”: “Anglicana Formata”,

“ScriptQuality”: 5,

“msContents”: [“Piers Plowman”],

“NumberOfWorks”: 1,

“PositionOfPiers”: 1,

“PiersFolios”: 90,

“PiersPercentMS”: 100,

“PiersTextVariety”: “C”,

“LinesOfPiers”: 7350,

“DialectRegion”: “Northwest Gloucestershire”,

“Collation”: [“ii”, “1-118“, “122(lacks 2)”, “1”, “ii”],

“Initials”: [

{“I”: “fol. 1r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:”red”,

“Rubric”: “hic incipit uisio Willelmi de Petro Plouhman”,

“Line”: ”In assomur seson wan softe was þe sonne”},{“T”: “fol. 4r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Explicit primus passus incipit passus secundus”,

“Line”: “That þe montaigne bymenuþ & þe merke dale”},

{“A”: “fol. 6v”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Explicit passus secundus hic incipit passus tercius”,

“Line”: “And thenne y kneled on my knes & criede to heore of gce”},

{“N”: “fol. 9v”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “explicit passus tercius ~ incipit passus quartus”,

“Line”: “Now mede þe mayde no mo of hem alle”},

{“C”: “fol. 16r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “explicit passus quartus Incipit passus quintus”,

“Line”: “Cesseth sayde þe kyng y suffer ȝow no lengur”},

{“T”: “fol. 18r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “explicit passus quintus incipit passus sextus”,

“Line”: “Thus y awakede wot god wan y wonede on cornhulle”},

{“W”: “fol. 20v”,

“HeightInLines”: 5,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “explicit passus sextus incipit passus septimus”,

“Line”: “With þat ran repetaunce and rehersed his teme”},

{“T”: “fol. 26r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “explicit passus septimus Incipit passus octauus“,

“Line”: “Tho cam sleuthe al by slobered wiþ to slymed eyen”},

{“T”: “fol. 29v”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “explicit passus soctauus Incipit passus nonus“,

“Line”: “Tho seide perkyn plouhman by seint petur of Rome”},

{“T”: “fol. 34r”,

“HeightInLines”: 7,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “hic explicit passus nonus Incipit passus decimus“,

“Line”: “Treuthe herde telle her of and to Peres sente”},

{“T”: “fol. 38r-v”,

“HeightInLines”: 7,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Explicit visio Willi de Petro plouhman/Incipit visio eiusdem Willmi de dowel“,

“Line”: “Thus robed in russet y romed a boute”},

{“T”: “fol. 42r”,

“HeightInLines”: 7,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “hic explicit passus primus incipit passus secundus de dowel“,

“Line”: “Then hadde wit a wyf was hote dame studie“},

{“A”: “fol. 46r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “hic explicit passus secundus hic incipit passus tercius“,

“Line”: “Alas eye quaþ Eolde and holynes boþe”},

{“Ȝ”: “fol. 49v”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “hic explicit passus tercius Incipit passus quartus de dowel”,

“Line”: “Ȝe wel worþe pouerte for he may walke unrobbed”},

{“I”: “fol. 52v”,

“HeightInLines”: 8,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Hic explicit passus quartus Incipit passus quintus de dowel“,

“Line”: “I am ymaginatyf quod he Idul was y neuere”},

{“A”: “fol. 55v”,

“HeightInLines”: 7,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “hic explicit passus quintus Incipit passus sextus de dowel“,

“Line”: “And y a waket þer wiþ witles nerhande”},

{“A”: “fol. 59r-v”,

“HeightInLines”: 7,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Hic Explicit passus sextus Incipit passus septim9 de dowel“,

“Line”: “Alas þat richesse schal rene & robbes mannes soule”},

{“T”: “fol. 64r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “hic explicit passus septim9 & vltim9 de dowel/hic incipit passus primus de dobet“,

“Line”: “Ther is non suche y sayde þt som tyme ne borweþ”},

{“L”: “fol. 68r”,

“HeightInLines”: 5,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Hic explicit passus primus Incipit passus scds de dobet“,

“Line”: “Leue libũ arbitrũ quaþ y y leoue as y hope”},

{“I”: “fol. 72r”,

“HeightInLines”: 8,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Hic explicit passus secundus incipit Passus tercius de dobet“,

“Line”: “I am spes aspye quaþ he and spure aftur a knyht”},

{“F”: “fol. 76r”,

“HeightInLines”: 7,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Hic explicit passus tercius/ de dobet / hic incipit passus quartus/ [de dobet]”,

“Line”: “F(?)ol werþ and wetschod wente y forþe aftur”},

{“T”: “fol. 82r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Explicit passus quartus et vltimus de dobet/Incipit primus passus de dobest“,

“Line”: “Thus y awakede & wrot what y hadde dremed”},

{“A”: “fol. 88r”,

“HeightInLines”: 6,

“Color”: “blue”,

“Filigree”:null,

“Rubric”: “Explicit passus prim9 de dobest incipit passus secundus“,

“Line”: “And as y wente by þw qy when y was þus awaked”},

],

“PrintedEdition(s)”: null

}}

}

 

2 thoughts on “@context for Trinity College Dublin MS 212”

  1. Matthew Davis, postdoctoral fellow in Data Curation in Medieval Studies at North Carolina State, has kindly allowed me to paste our facebook conversation in the comments:
    Matthew Davis Is there a reason why you’re starting with JSON instead of translating from the metadata for the catalog descriptions or the XML of the text?

    Matthew Davis (sorry, that probably sounded snarky. It was not meant to be)

    Angie Bennett Segler There is a reason, and it’s primarily because JSON is (for me) more easily human readable. So, if I produce a JSON object describing a MS, it’s just as easy for me to read it and get all the pertinent information as it is for a machine. I also don’t lose any of the flexibility of RDF/XML formats because it’s possible to write a JSON-LD context that will convert my JSON data into triples (then readable just like any RDF/XML). Honestly, it’s about what’s easiest to read and write for people who are not already proficient in coding.

    Angie Bennett Segler (It’s all a part of my secret plan to convince other manuscript scholars that they can do it too and should do it with me!)

    Matthew Davis Fair enough; I’m just thinking that there’s already validated Piers XML from the Piers Plowman Electronic Archive that you could crosswalk to JSON.

    Angie Bennett Segler Also, I’m not primarily interested in text mark-up, since most of my data is not about the text in the manuscript, but the physical object. At present, I do not have access to the XML in PPEA, though that is something I’m working on remedying.

    Angie Bennett Segler I’m more interested in my data LINKING to PPEA than recapitulating the work they’re already doing. Text and object together in the Semantic Web.

    Angie Bennett Segler And one last thing in your first question that I failed to respond to: for most MSS, there does not appear to be XML catalog descriptions yet, so I wouldn’t gain ease of integration on that front, at least not yet. BUT as catalogs go online I will have to plan for that, which is why in two weeks I’m going to talk about Uniform Resource Identifiers and how to make JSON-LD and RDF/xml talk to one another.

    Angie Bennett Segler Thanks for your input, though. I’m only learning this myself as I go along (certainly not claiming to be an expert!), so I am more than interested in comments, feedback, and suggestions on how to proceed. In fact, if you felt so inclined I’d be happy if you posted to the blog proper so that me and all TEN of my followers (hey…it’s only a week old!) can also be a part of this conversation about how to generate data that is link-able to existing data and meta-data.

    Matthew Davis Well, I have some concerns about imposing context as defined by linked data on the material object as being an inherently flattening experience, but I may pop up from time to time. If you want to paste this conversation in that’s ok by me.

    Angie Bennett Segler Is it just a shameless plug for me to say that I too have concerns about flattening and have talked about why I’m doing what I’m doing to material objects in the content blog from this week?
    https://materialpiers.wordpress.com/2014/05/16/the-digital-material-nexus/

  2. Excellent back and forth about JSON and XML. I like JSON because as does XML it allows complex graph-like descriptions of phenomena. That is, it can be “non-flat”. The “-LD” addition to JSON lets one use the @context to render the complex structure into rdf-triples for processing in other tools. RDF is again inherently “non-flat”.
    But to step back… the above paragraph is probably dense enough that it needs an example. Angie is definitely leading the way here. I intend to catch up by way of the (geo)json-ld I’m building for Roman Amphitheaters at http://purl.org/roman-amphitheaters . There are some non-flat structures in there. See https://github.com/sfsheath/roman-amphitheaters for examples of converting that data into different formats. That’s one indication that JSON-LD can be both well-suited to complex data structures and support reusability in many tools.
    I’m sure Matthew would have further queries about this comment as it’s very incomplete. I mean it to be an indication that I think this is an important discussion, one which should lead to “choosing right tool for right situation,” rather than any huge philosophical split.

Please do collaborate!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s