2 minutes to read Posted on Friday January 8, 2021

Updated on Monday November 6, 2023

Gregory Markus

EuropeanaTech Community Manager , Netherlands Institute for Sound & Vision

Announcing the EuropeanaTech Challenge for Europeana Artificial Intelligence and Machine Learning datasets

EuropeanaTech is excited to invite proposals for the assembly of Artificial Intelligence/ Machine Learning (AI/ML) datasets drawn from the extensive collections on the Europeana website. Two proposals will be selected to receive a financial stipend of €2,500 each, to support the production, documentation and publication of the datasets.

About the call

Methods from the field of Artificial Intelligence and Machine Learning (AI/ML) have helped push technological boundaries in various domains, including in the cultural heritage sector (see examples in the Interim Report of the EuropeanaTech AI in relation to GLAMs Task Force or the AI4LAM initiative).

Many AI/ML methods of interest to applications in GLAMs are supervised; for example, they work by training a predictor (like a neural network) using ground truth (ideal and expected outputs) or labeled data, from which the method is able to learn and infer a model. In order for the model to generalise well and perform accurate predictions for a wide array of inputs, its training data need to be of sufficient volume, quality and be representative for the domain from which it is sampled. Otherwise, there is a risk of overfitting (the model will only make good predictions for inputs that are very similar to the training data) or the introduction of biases, which will not only reduce the model’s general applicability and performance, but can also entail ethically problematic or otherwise unintended side-effects.

The GLAM sector is well positioned for the takeup of AI/ML in the sense that curated and diverse data of sufficient volume, quality and diversity in the form of digital collections from GLAMs (such as those aggregated and provided by Europeana), are now widely available under open licenses. What is currently lacking is the wider availability of datasets from the GLAM sector that are appropriate for direct use in the context of AI/ML research and development. The availability of such open datasets could not only help foster more engagement with digital cultural heritage data in AI/ML, but also support the transfer of recent advances in AI/ML to the field of digital curation and analysis of cultural heritage content. On the other hand, further advances in AI/ML often go hand in hand with the release of new high-quality datasets.

EuropeanaTech therefore invites proposals for the assembly of suitable AI/ML datasets, drawing from the extensive collections on the Europeana website. We are seeking proposals for the creation of large, well-documented datasets that are shaped for direct takeup for AI/ML purposes (such as training a model) and that can be made publicly available on relevant online platforms under open licenses.

We will award the two winning proposals a financial stipend of €2,500 to support the production, documentation and publication of the datasets. Award winners will be invited to present their contributions at a future Europeana (online) event and provide a text for publication related to their outputs.

How to apply

To apply, please read the submission guidelines below and submit a proposal by 15 February 2021, 23:59 CET. Proposals should describe in less than 1,500 words:

The intended contents of the dataset (in terms of volume, types of assets, annotation, etc.)
The procedure you intended to follow for producing the dataset
How it is relevant for AI/ML.

Proposals should also include a suggestion for a possible use case, supported by a pre-trained model with a demonstration or evaluation of its results. In case of acceptance, it must be feasible to produce and release the dataset and all necessary documentation and technical resources before 30 June 2021.

European cultural heritage collections are commonly subject to biases and entail ethical issues. While this can negatively impact AI and machine learning solutions, AI and machine learning could also be used to uncover these issues. These issues might not be overcome within the scope of this call, but we advise you to document and discuss them.

Submit your proposal

Collapsed content

Submission guidelines

The datasets MUST:

Be drawn from data included in the various collections provided through Europeana;
Only include metadata that is either created by you or comes from Europeana. The resulting metadata must be licensed under Creative Commons Zero;
Be compiled in a machine-readable format including documentation and provenance;
Not have been published before. If previously published, steps must be detailed as to how the new dataset is to be improved and used;
Include a description of one or more intended use cases of the dataset.

The datasets SHOULD:

Only include media assets with a license compatible with Europeana Publishing Framework content tier 3;
Clarify the relation with and contribution to AI and ML best-practices and state-of-the-art within digital cultural heritage;
Include a pre-trained model resulting from applying (using a baseline ML/AI method for (one of) the intended use cases ) and a demo of using this model or evaluation of its results;
Document or discuss potential ethical issues and biases.

The datasets MAY:

Include additional curatorial enrichments and improvements such as data annotation, labeling or cross-referencing with other (digital) resources, under the condition that these are completed before dataset release and that appropriate quality control measures are applied;
Form part of a publication in a peer-reviewed journal or conference.

Basic documentation for technical solutions should be provided and any software produced must be released under an open source license.

Key dates

Call opens: 8 January 2021
Deadline for submissions: 15 February 2021, 23:59 CET
Notification of acceptances: 1 March 2021
Publication of dataset: 30 June 2021

Award criteria

Submissions will be reviewed by the EuropeanaTech AI in GLAMs Task Force and the EuropeanaTech community Steering Group based on:

Relevance of the use case for the GLAM community: 25%
Relevance of the dataset for AI/ML in relation to the use case: 25%
Clear definition of the use case/demo in relation to Europeana: 30%
Clarity in the description the dataset is produced: 20%

Eligibility

Formally, the funds will not be allocated to individuals but to institutions, which can be cultural heritage or research institutions, comprising universities. A representative of each awardee institution will be asked to sign a subcontract with the Europeana Foundation.
Applicants must be based in an EU member state.
Applicants must be a member of the EuropeanaTech community and Europeana Network Association. If you are not already a member, you can find out how to join.
Award is the gross amount therefore includes VAT.
Europeana DSI-4 project partners are not eligible for funding. The full list is available here.