2 minutes to read Posted on Monday March 22, 2021

Updated on Tuesday June 8, 2021

portrait of Clemens Neudecker

Clemens Neudecker

Project Coordinator , Berlin State Library

EuropeanaTech Challenge for Europeana AI/ML Datasets: announcing the winners!

With lowered barriers to access and the development of new practices for Artificial Intelligence (AI), it’s no surprise that AI-related activities in the cultural heritage sector are increasing -  a topic we are focusing on this month on Europeana Pro. In this post, we are delighted to introduce the winning projects from EuropeanaTech’s first Challenge for Europeana AI/ML Datasets!

Etching of women working in a factory
Title: Carding, drawing and roving.
Institution: Wellcome Collection
Country: United Kingdom
CC BY

Methods from the field of Artificial Intelligence (AI) and Machine Learning (ML) have helped push technological boundaries in various domains, including in the cultural heritage sector (the Interim Report of the EuropeanaTech AI in relation to GLAMs Task Force and the AI4LAM initiative provide some examples). To encourage innovation in this area, a few weeks ago EuropeanaTech announced its first Challenge for Europeana AI/ML Datasets. With this new activity, we wanted to stimulate the creation of datasets for the GLAM sector that can be used for AI/ML, drawing from the rich cultural heritage resources available in Europeana. We hope that the availability of such datasets could help to foster more engagement with digital cultural heritage data in AI/ML and support the transfer of recent advances in AI/ML to the field of digital curation and analysis of cultural heritage content.

We received a total of five proposals, which were carefully reviewed by members of the EuropeanaTech Steering Group and AI in relation to GLAMs Task Force. They assessed the proposals based on their relevance for the GLAM sector (25%), relevance for AI/ML (25%), relation to Europeana (30%) and clarity of the description and work plan (20%).

Announcing the winners

Named Entities in Archeological Texts

This proposal from a team based at the University of Naples 'L'Orientale' aims to create a dataset for Named Entity Recognition (NER) and Term Extraction for archeological terms in Italian and English in the Europeana Archeology collection. NER is the process of identifying proper names such as person names or locations in unstructured text. Term Extraction is similar, but focuses on finding specialised terms, in this case from the archeology domain. Vocabularies like Getty and CIDOC CRM will be considered. The final dataset could be used in the development and evaluation of AI/ML based technologies for NER in the archeology domain. 

Reviewers particularly appreciated the clear structure and maturity of the proposal, for which a mock dataset was already made using Europeana’s APIs to test the approach proposed. The bilingual aspect and the scarcity of similar open resources for the archeology field were also seen as particularly valuable. 

Zac Grace

This proposal by a student of the Ecole Nationale d'Ingénieurs de Tarbes aims to create pixel masks for semantic segmentation, through manual annotation of image data in the Europeana Fashion collection. This means that, for example, when an image is analysed, the relevant fashion elements (shirt, trousers, shoes) in the image are then marked with their pixel outlines. Such data can be used for training an automated segmentation system.

Example illustration of a pixel mask for semantic segmentation for four image segments: Background (black), Bear (brown), Rose (green), Bucket (blue)

Reviewers liked the clear scope and understanding of the work required to implement the proposal. They also thought that it had  a lot of potential for application across different collections.

The Contentious Contexts Corpus

This joint proposal by the KNAW Humanities Cluster and the Centrum Wiskunde & Informatica in the Netherlands wants to establish an annotated corpus of contentious terms in context (ConConCor) from Dutch newspapers in Europeana. These can then be used to bootstrap and evaluate (semi-)automatic methods for detecting such terms in cultural heritage collections. Contentious terms here means those words or phrases that are suggestive of some (implicit or explicit) bias towards or against a group, event, or otherwise.

Reviewers valued how this proposal aims to address a key target in the challenge, the detection of ethical issues and biases that are inherent in digitised cultural heritage collections.

Three stipends of €2,500 each will be made available to the winners in order to implement their proposals and deliver the according datasets by end of June 2021.

Find out more

We would like to extend our gratitude to everyone who submitted a proposal to this challenge for their hard work and excellent ideas. We look forward to the implementation of the winning projects and hope that another round will open in the future for those who were not successful this time!

If you would like to hear about more opportunities like this and network and collaborate with multidisciplinary technical professionals from around the world, join EuropeanaTech through the Europeana Network Association and follow the community on Twitter

This post was edited on 16/04/21 to reflect the extended deadline for winners to deliver their datasets. 

top