Digital Musicology #3: Building networks of music sources

Building networks of 16th Century music sources – what about the data? -- Marnix van Berchum, University of Utrecht

The notion of ‘distant reading’ – introduced by Franco Moretti – gained wide popularity in the emerging field of Digital Humanities. It is interesting to see whether the ideas of Moretti can also be applied to the field of musicology. Stephen Rose and Sandra Tuppen mention Moretti as inspiration for the British library project ‘A Big Data History of Music’.

But what other methods, beside those presented by Moretti, Rose and Tuppen, and others could prove to be of use for historical musicology? In an earlier blog post in this series on Digital Musicology, Peter van Kranenburg presents the opportunities of modelling musical style from large corpora of digitised compositions.

My research explores how network theory, and its concepts and methodologies, could be useful in studying the transmission of music in the sixteenth century. The entities of the networks I am currently investigating are the sources and the compositions they contain (see figure 1).

Connected attributes of both entities are valuable as well – e.g. the dates and locations of origin of the sources, the physical characteristics of the sources or the composers associated with the composition in case. It is the type of data which Kranenburg calls ‘contextual data’.

Figure 1) A bipartite graph of eleven manuscripts (blue nodes), produced by the scriptorium of Petrus Alamire and currently held at the Thüringer Universitäts- und Landesbibliothek in Jena, Germany and the compositions within these sources (red nodes).

The data I use are the metadata on music manuscripts and music prints from the sixteenth century, the primary sources of music. Most of these musical sources are described in the secondary literature produced by more than a century of musicology; many of them are digitised and available online.

The secondary literature though – which very often lists the content of (a particular) source(s) – is mainly available in printed volumes, stacked in the music and university libraries of the world.

A small percentage of this information is available in a digital format, like an online database. These databases originate primarily from musicological projects, and are due to the nature of projects very often limited in scope. A preliminary exploration of the (meta)data on sixteenth century music sources – or broader: Early Music sources – available online, shows that many differences exist. A first inventory can be found in this sheet.

Scope
The scope of the information presented hugely differs between the databases. Every database/project has its own focus on a specific historical period (e.g. one century only or the whole ‘renaissance’ period), geography (e.g. Europe, the German speaking areas or the Low countries) and genre (e.g. only motets or chansons, or no specific genre). Furthermore, the amount of sources included ranges from a handful to several thousands.

Entities / attributes
Within the defined scope each database has made different choices in what information to include. The vdm16 database for example has an elaborate section on the physical characteristics of the sources (including e.g. collation information), but does not provide a table of contents.

The CMME project on the other hand provides nearly no information on the visual appearance of the source, but presents the compositions of each, in the order as they appear in the source.

Standardisation
The different databases use different ways of standardising information. A very relevant one is the abbreviation used for the manuscript sources: the RISM project provides an authoritive list (which e.g. is used in the DIAMM database), but for the period 1400-1550 the abbreviations of the paper reference work Census-catalogue of manuscript sources of polyphonic music, 1400-1550 are also very common in musicological literature (and used in e.g. the CMME project). A similar issue occurs with composer names: the composer Orlande de Lassus can be found as ‘Orlande de Lassus’, and ‘Orlandus Lassus’ as well. Even within one database different spellings are used.

Structure
Of the databases investigated, it is not always clear what the underlying data structure is (see also Transparency below). One would presume there is a (relational) database management system underlying the web interface, but this is not explicitly visible in most databases. In the database of Base Chanson for example, from one record it is not possible to ‘click through’ to another piece of metadata and see more records; the records are (or at least appear) to be static and not linked to each other.

Transparency
The possibilities of re-use of the information offered – not only ‘as is’ from the website, but also as ‘raw data’ – could be greatly improved if more is known about the characteristics discussed above. There is a lack in transparency on the information offered by the websites though, with positive exceptions like the Motet Database Catalogue Online.

This project states explicitly composer names are standardised to conform to the authoritative New Grove Dictionary of Music and Musicians; as I know from my involvement with the CMME project, this database does the same, although it does not state it anywhere on the website. Another aspect of transparency would be information on the last update of the online material.

In conclusion, there is no single, standardised dataset on music from the sixteenth century available, which I can readily use for my research. An increase in standardisation, structuring and transparency of the data currently available in the (small) project databases online, would make the interlinking of these data a feasible and foreseeable task. Similarly to the situation of data of the music itself (mentioned by Kranenburg), there is still work to be done. Or, to cite Laurent Pugin:

[…] obtaining or accessing high quality datasets remains a serious hurdle, especially on a large scale, in a similar way to accessing sources a couple of decades ago. It is a major barrier that needs to be removed if digital musicology research is to be taken to the next level.