2 minutes to read Posted on Monday March 9, 2015

Extending the Europeana Data Model for richer descriptions of sounds materials

How can we best represent the particularities of sounds metadata using the Europeana Data Model (EDM)? How do we preserve the semantic of the metadata when they are mapped to EDM? These are two of the questions that members of the Europeana Sounds project and EuropeanaTech tried to solve in the report EDM Profile for Sounds

The main task was to identify the characteristics of sounds objects and the metadata describing them, and translate these into specifications for EDM.

Maagdenhuis songs (EP). Amsterdam Museum (CC BY).

Describing Sounds objects using EDM

When using EDM, data providers are asked to describe two main types of resource: a Cultural Heritage Object (CHO), which is what results from a search in the portal Europeana.eu, and digital representations (or 'WebResources' in EDM terminology) of these CHOs. 

The first challenge for the Task Force was to identify what would classify as a CHO or a WebResource in the context of sounds materials. To do this, we began by distinguishing analogue objects from born-digital objects. In the context of EDM, both can be considered as potential CHOs, and associated with one or many digital representations or WebResources. We established the following distinction:

  • An analogue object can be digitised and then represented by many WebResources varying in terms of formats, quality, etc., 
  • A born-digital object can also give rise be to many WebResources with individual characteristics. 

We then differentiated the description of an analogue object from its digital representation(s). This included: 

  • The distinction between a “bare” carrier (a disc, vinyl, shellac with physical characteristics like size) and the sound recording (a carrier with sound recorded on its tracks, with attributes like duration) itself, 
  • The distinction between a musical work, the multiple performances of this work (an event, with attributes like a date or a place) and the recordings (physical), 
  • The distinction between a specific recording and the sound itself (abstract, conceptual). 

Laughing Kookaburras. The British Library (CC BY SA).

Further distinctions could be made if we then considered all the related resources feasibly associated with the objects mentioned above, such as images, music scores, programme notes and so on.

Making these distinctions is key from a semantic point of view as it will determine which EDM classes need to be chosen and where the metadata needs to be applied.

EDM core classes for data providers. Valentine Charles, Europeana (CC BY SA)​.

Identify Sounds objects characteristics and their equivalent in EDM

The next step was to identify the specificities of sounds objects metadata and to see if the needs could be met using EDM in its current definition. 
EDM has been designed to be extensible, which means that, when declared EDM classes and properties do not sufficiently represent the knowledge in the metadata, the model allows specialisations. 
In most cases identified by the Task Force, extensions of EDM were needed to support more granular metadata. The extensions have been taken from existing open data models, namely EBUCore,the Music Ontology and Dublin Core. These new properties have been declared as specialisations of the existing ones. The main additions are listed below. 

Distinction of the master version among the digital representations available for a CHO

Since EDM allows the description of several WebResources (or digital representations) per Cultural Heritage Object [http://pro.europeana.eu/share-your-data/data-guidelines/edm-case-studies/mimo-edm], the Task Force identified the need todistinguish the master version from the others. We defined the master version as not necessarily the best quality version but the version that has a significance in the life of the object. For instance, the first recording of a concert might have more historical importance than the different mixes of the same recording made later. The properties needed to represent the described situation have been added to EDM. 

Duration in sounds CHOs and WebResources

Duration is an important feature of sounds objects. EDM currently uses the general Dublin Core dcterms:extent property to qualify either the size or the duration of a resource. The Task Force defined the new property ebucore:duration as a subproperty of dcterms:extent to specifically capture duration information.                  

Track information for sound CHOs and WebResources      

Metadata descriptions for audio recordings and audio-related materials often contain information related to tracks that compose them. The Task Force recommended  the use of more detailed properties in addition to the generic dcterms:extent or dc:description that would be used according to the current EDM schema. The new properties allow to capture the number of track, the track side and the track number. 

Other technical metadata        

The Task Force also selected a list of additional technical metadata which could be either provided by data providers in the data or fetched automatically by Europeana. These properties are very important when it comes to the re-use of media files by third parties. They capture information such as the MIME type, bit rate, file size, the audio encoding format. All the properties have been re-used from the EBUcore model. 

Different types of dates for sound CHOs or WebResources;

The diversity of sounds objects identified by the Task Force raised the need of managing different types of dates in EDM. A lot of time can pass between the creation of a musical work, its recording, the digitisation of this recording and its publication. It is therefore important to capture all these different dates. 
By capturing different dates, different situations - during which access rights for an object have changed - could be represented. The Task Force has identified the following dates as relevant for the EDM profile:

  • Date of creation: It can be used to describe a date of composition (for a given work), or a date of recording (for a performance of a given work) 
  • Date of publication
  • Date of digitisation
  • Date of copyright
  • Date of modification

Hierarchical relationships and collections

One characteristic of Sounds objects is also their complexity in terms of structure: a musical work is described in many movements, a recording distributed over many tracks. It is therefore important to be able to represent the hierarchical relationships between the different levels which make an object. This section of the discussion was entirely based on the work done by the Task Force on hierarchical objects[http://pro.europeana.eu/get-involved/europeana-tech/europeanatech-task-forces/hierarchical-objects]. The group also spent time defining the notion of collection and how to represent it in EDM. This type of modelling is particularly relevant for the ethnographical recordings that will be provided to Europeana as lot of contextual information is described at the level of the collection and not on the recordings. 

Anges musiciens. Artist Unknown. National Library of France (Public Domain).

Europeana gathers, in addition to Sounds objects, a great deal of related materials such a music scores, videos of performances, manuscripts, engravings, musical instruments, all of which have an important role in the contextualisation of the other objects. By bringing together and connecting these diverse objects, EDM now offers the chance to make Europeana a world of music.