Innovating metadata aggregation in Europeana via linked data

Aggregating linked data has the potential to improve data interoperability at global scale. A linked data pilot, developed as part of the Europeana Common Culture project, has investigated this potential, and in this post we take a look at what it achieved.

Aggregating linked data

Linked data is a way of publishing structured data on the web that allows metadata to be connected and enriched. This ensures that different representations of the same content can be found, and links made between related resources. Aggregating linked data has the potential to bring cost benefits and improve data interoperability at global scale, and the Europeana Common Culture project investigated the feasibility of using such linked data for aggregation.

Europeana already operates a scalable and sustainable metadata aggregation model for the cultural heritage sector. Aggregating linked data would mean that data providers would more easily be able to share their metadata with cultural heritage aggregators who made use of linked data. For providers not yet publishing linked data, implementing it for participation in Europeana would also provide them with the benefit of being able to use their linked data for other applications, and with other domains besides cultural heritage, such as Internet search engines.

Working with data providers

This pilot ran from May 2019 to June 2020. It was coordinated by Netherlands Institute for Sound and Vision (NISV) and delivered in close collaboration with the Dutch Digital Heritage Network (NDE) who supported the project by providing knowledge, software and infrastructure to run the tests. The pilot involved three types of participants in the Europeana ecosystem: data providers, aggregators and the Europeana Foundation. Twelve data providers joined the pilot, but not all of them were fully aware of the technical challenges that this novel approach would bring. Four of the providers were not able to deliver a dataset as linked data, and two other providers delivered datasets with insufficient data for aggregation into Europeana.

In the six successful cases, five providers already had in-house knowledge or an existing implementation of linked data, and for one, it was its first effort in publishing linked data. Our conclusion is that there is much interest in implementing linked data among data providers. However, it requires a significant level of resources when an organisation does not have any previous experience.

Pilot outcomes

The pilot applied an approach for linked data aggregation based on two specifications for delivering a linked dataset for Europeana. These had previously been successful for a small-scale pilot in the Rise of Literacy project.

The first specification is that dataset-level metadata should be provided by using well-known vocabularies. It includes the kinds of dataset distributions that data providers can use, and the required metadata for each.

A second specification addresses the use of Schema.org linked data for describing cultural heritage objects according to the requirements of Europeana and the Europeana Data Model (EDM). Currently, Europeana only supports ingestion of metadata in EDM. However, experiments on applying Schema.org to metadata descriptions of cultural heritage objects have shown that it can provide good quality data, which is capable of fulfilling the requirements of Europeana. This specification provides a general level of guidance for use of Schema.org metadata that, after conversion to EDM, will result in metadata that is suitable for aggregation by Europeana.

This pilot also resulted in a toolset for linked data aggregation that is designed for use by Europeana aggregators and aggregators of other similar networks. Although the toolset functionality is tailored for the EDM, aggregators using other data models may add their own conversions and validations using the standards implemented by the toolset. The toolset is based on Docker containers which preserve the technical independence of its tools, making the solution portable to different environments, and scalable, giving the possibility to apply the toolset to small or large collections. The toolset and its source code is available in Github.

Future work

A number of areas for future work have been identified. Data providers would benefit from tools for preparing their linked data. The validation tools implemented in the toolset can also be used in the creation of services for data providers, allowing them to check the validity of their data at earlier stages of linked data publication. An initial step in this direction was conducted by testing the aggregated linked data using the Europeana Metis Sandbox. A second line of work starting in 2021 will focus on components for interoperability and integration of the toolset into aggregators’ systems. This work will be coordinated by The Netherlands Institute for Sound and Vision in the Dutch national project Digitale Collectie.

To find out more about linked data, watch our webinar from October 2020 about LODA - the Linked Open Data Aggregator, and if you are interested in the topic and would like more chances to discuss it, join the EuropeanaTech community.