When Metis (the application that Europeana uses to run its ingestion, aggregation and enrichment pipeline) was conceived, it was designed not only to make the Europeana Foundation’s work more efficient, but also to be a system that Europeana’s data partners can work with to make data processing easier and more rewarding for them.
In the early days of Metis, only certain functionalities were available to aggregators, which mainly allowed them to check whether data could be accepted for ingestion in Europeana. But now that the Metis Sandbox has been released as a tool, all Metis workflow steps for testing data ingestion can be performed in one go through a user-friendly interface which also allows data to be previewed as if it were on the Europeana website. With this, the Europeana Initiative is getting closer to the ambitions set forth in the aggregation strategy to help speed up dataset updates, involve contributors in testing and encourage data enrichment. This will in turn benefit the quality of our data and contribute to building capacity in aggregators.
What aggregators have to say about working with the Metis Sandbox
Tom Miles, Metadata Coordinator of the Europeana Sounds aggregator at the British Library, and Kerstin Arnold, Manager (COO) of the Archives Portal Europe Foundation (APEF), have been using the Metis Sandbox since the early days. Cosmina Berta, Advisor for Project Management, Tools and Workflows, and her colleagues from the Deutsche Digitale Bibliothek were involved with the Sandbox even earlier, from the pilot stages in the Europeana Common Culture project, and continue to use the Metis Sandbox regularly in their work.
For Kerstin, working with the Metis Sandbox has streamlined the communication workflow between APEF, Europeana Foundation and the various archives providing data via APEF. ‘With the Metis Sandbox I can identify potential data issues myself before submitting the data to Europeana. I can also distinguish directly between issues that will have to be addressed by the archive and issues that APEF has to work on, like adaptations of the conversion to the Europeana Data Model (EDM).’
For Tom, it has been useful to see how a dataset is going to look when it is published on Europeana, because it’s not always easy to visualise the display from spreadsheets and XML files. It is now possible to see, for example, if more information should go in the title, whether there is too little or too much information in the description field, or whether the subject terms work properly. ‘It’s been really useful having access to the Metis Sandbox for the Microsoft books dataset - I’ve been able to review this dataset using the Sandbox and spotted several things that needed changing.’
For Cosmina, ‘the Metis Sandbox is very successful in the Europeana aggregator community and we, the Deutsche Digitale Bibliothek, are extremely proud to have contributed to the developments of such a practical tool. We use it for all our Europeana deliveries and it is so helpful to be able to flag issues early enough in the data delivery, so that we can correct them efficiently. In our opinion, the main advantage is that we can show the data partner - who are the very source of the data - what the consequences of thorough cataloguing and mapping are, or the effects of bad quality and good quality data in the Europeana website. So we definitely see the Metis Sandbox as a good “learning resource” for all those involved in the data delivery process.’
What else can the Metis Sandbox do?
The Metis Sandbox is particularly useful when working with new datasets. It helps to provide a sense of the overall quality of the dataset and to confirm if an expected or required level of data quality can be achieved. In that context, the Metis Media Processing module, which is embedded in the Sandbox, is useful. This component attempts to extract technical metadata from records’ linked media resources, and therefore prevents broken links as much as possible. Similarly, for aggregators who do not have their own EDM validation tool, using the Metis Sandbox can be essential in getting more immediate feedback and fixing data issues.
Metis Sandbox can also show its strengths when experimenting with new technologies or data formats. Several aggregators have started working with the International Image Interoperability Framework (IIIF) - a set of open standards for delivering high-quality, attributed digital objects online at scale. Mapping IIIF resources properly to EDM is more complex than mapping traditional media links. Getting a sense of how the mapping impacts the display and quality of the content can be tried out easily with small samples in the Metis Sandbox. Furthermore, experiments like this can also help to advance how the metadata and content tiers are calculated, thereby extending the use cases covered by the Europeana Publishing Framework.
Find out more
This news post is the second in our January Europeana Pro News focus on the Metis Sandbox! Keep following Europeana Pro news for more - our next post will explore how cultural heritage institutions work with the Metis Sandbox to deliver high-quality data. You can also find out more about how to share your data with Europeana.